Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution

2018-01-08 Thread Ingo Molnar

* Alan Cox  wrote:

> On Mon, 8 Jan 2018 11:08:36 +0100
> Peter Zijlstra  wrote:
> 
> > On Fri, Jan 05, 2018 at 10:30:16PM -0800, Dan Williams wrote:
> > > On Fri, Jan 5, 2018 at 6:22 PM, Eric W. Biederman  
> > > wrote:  
> > > > In at least one place (mpls) you are patching a fast path.  Compile out
> > > > or don't load mpls by all means.  But it is not acceptable to change the
> > > > fast path without even considering performance.  
> > > 
> > > Performance matters greatly, but I need help to identify a workload
> > > that is representative for this fast path to see what, if any, impact
> > > is incurred. Even better is a review that says "nope, 'index' is not
> > > subject to arbitrary userspace control at this point, drop the patch."  
> > 
> > I think we're focussing a little too much on pure userspace. That is, we
> > should be saying under the attackers control. Inbound network packets
> > could equally be under the attackers control.
> 
> Inbound network packets don't come with a facility to read back and do
> cache timimg. [...]

But the reply packets can be measured on the sending side, and the total delay 
timing would thus carry the timing information.

Yes, a lot of noise gets added that way if we think 'packet goes through the 
Internet' - but with gigabit local network access or even through localhost
access a lot of noise can be removed as well.

It's not as dangerous as a near instantaneous local attack, but 'needs a day of 
runtime to brute-force through localhost or 10GigE' is still worrying in many 
real-world security contexts.

So I concur with Peter that we should generally consider making all of our 
responses to external data (maybe with the exception of pigeon post messages) 
Spectre-safe.

Thanks,

Ingo


Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands

2017-01-17 Thread Ingo Molnar

* Martin K. Petersen <martin.peter...@oracle.com> wrote:

> >>>>> "James" == James Bottomley <james.bottom...@hansenpartnership.com> 
> >>>>> writes:
> 
> James> Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough
> James> commands
> 
> James> mpt3sas has a firmware failure where it can only handle one pass
> James> through ATA command at a time.  If another comes in, contrary to
> James> the SAT standard, it will hang until the first one completes
> James> (causing long commands like secure erase to timeout).  The
> James> original fix was to block the device when an ATA command came in,
> James> but this caused a regression with
> 
> Broadcom folks: Please test and ack as soon as possible so we can get
> this fix queued up.
> 
> Ingo: Since you appear to have hardware, it would be great if you could
> test James' v3 (https://patchwork.kernel.org/patch/9519383/). Sorry for
> the inconvenience.

As per the interdiff below v2->v3 did not change the code in any way, only the 
name of the function and a comment, so you can add this to v3 as well:

  Reported-by: Ingo Molnar <mi...@kernel.org>
  Tested-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 6f9b4c051e4d..830e2c10ba02 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3899,7 +3899,7 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
}
 }
 
-static int set_satl_pending(struct scsi_cmnd *scmd, bool pending)
+static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
 
@@ -3934,7 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
if (!scmd)
continue;
count++;
-   set_satl_pending(scmd, false);
+   _scsih_set_satl_pending(scmd, false);
mpt3sas_base_free_smid(ioc, smid);
scsi_dma_unmap(scmd);
if (ioc->pci_error_recovery)
@@ -4084,7 +4084,9 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
}
 
/*
-* Bug work around for firmware SATL handling
+* Bug work around for firmware SATL handling.  The loop
+* is based on atomic operations and ensures consistency
+* since we're lockless at this point
 */
do {
if (sas_device_priv_data->ata_command_pending) {
@@ -4092,7 +4094,7 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
scmd->scsi_done(scmd);
return 0;
}
-   } while (set_satl_pending(scmd, true));
+   } while (_scsih_set_satl_pending(scmd, true));
 
sas_target_priv_data = sas_device_priv_data->sas_target;
 
@@ -4661,7 +4663,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
-   set_satl_pending(scmd, false);
+   _scsih_set_satl_pending(scmd, false);
 
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-16 Thread Ingo Molnar

* James Bottomley <james.bottom...@hansenpartnership.com> wrote:

> On Mon, 2017-01-16 at 10:22 +0100, Ingo Molnar wrote:
> > * James Bottomley <james.bottom...@hansenpartnership.com> wrote:
> > 
> > > On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote:
> > > > So there's a new mpt3sas SCSI driver boot regression, introduced
> > > > in 
> > > > this merge window, which made one of my servers unbootable.
> > > 
> > > We're not reverting a fix that would cause regressions for others. 
> > 
> > You really need to reconsider that stance ...
> > 
> > > However, The fix was manifestly wrong, so does this fix of the fix
> > > work for you:
> > > 
> > > http://marc.info/?l=linux-scsi=148329237807604
> > > 
> > > It's been languishing a bit because no-one seemed to care enough to
> > > test or review it.  IOf you can add a tested by, that will give the
> > > two
> > > we need to push it.
> > 
> > I have tested your other patch that you pointed to:
> > 
> >   http://marc.info/?l=linux-scsi=148449968522828
> > 
> > Which patch fixes the bug too (I removed my revert first) - so you
> > can add my:
> > 
> >   Reported-by: Ingo Molnar <mi...@kernel.org>
> >   Tested-by: Ingo Molnar <mi...@kernel.org>
> 
> Thanks ... just checking you tested the second version with the
> concurrency part?

I tested the one I linked to:

   http://marc.info/?l=linux-scsi=148449968522828

I don't know which version that is exactly, I just tested what you asked me to 
test.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-16 Thread Ingo Molnar

* James Bottomley <james.bottom...@hansenpartnership.com> wrote:

> On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote:
> > So there's a new mpt3sas SCSI driver boot regression, introduced in 
> > this merge window, which made one of my servers unbootable.
> 
> We're not reverting a fix that would cause regressions for others. 

You really need to reconsider that stance ...

> However, The fix was manifestly wrong, so does this fix of the fix work for 
> you:
> 
> http://marc.info/?l=linux-scsi=148329237807604
> 
> It's been languishing a bit because no-one seemed to care enough to
> test or review it.  IOf you can add a tested by, that will give the two
> we need to push it.

I have tested your other patch that you pointed to:

  http://marc.info/?l=linux-scsi=148449968522828

Which patch fixes the bug too (I removed my revert first) - so you can add my:

  Reported-by: Ingo Molnar <mi...@kernel.org>
  Tested-by: Ingo Molnar <mi...@kernel.org>

BTW., is it wise to work around the out of spec firmware in the mpt3sas code 
and 
leave the overly optimistic assumptions in the SCSI code intact? The problem is 
that other SCSI hardware could be affected as well - and especially enterprise 
class server hardware has long testing and thus regression latencies (as my 
example proves).

Wouldn't it be more robust to only submit one pass-through command at a time 
from 
the SCSI layer, and maybe opt-in hardware that is known to implement the SAT 
standard fully?

(But I'm just kibitzing here really.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread Ingo Molnar
nux-foundation.org>
  Date:   Wed Dec 14 10:49:33 2016 -0800

Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

... which is a head-scratcher, so I double checked the key bisection points, 
but 
the bisection result is robust. I also re-created Linus's merge and double 
checked 
the conflict resolution - which looks fine as well.

After (much) more testing it turns out that the bug is some sort of combination 
bug, in that scsi-next didn't have all the SCSI fixes that upstream already 
had, 
in particular it didn't have these commits:

  7ff723ad0f87 scsi: mpt3sas: Unblock device after controller reset
  18f6084a989b scsi: mpt3sas: Fix secure erase premature termination
  6d3a56ed0985 scsi: mpt3sas: Fix for block device of raid exists even after 
deleting raid disk

When Linus pulled in scsi-next-minus-fixes these two sets of commits combined 
and 
produced the regression - and made the bisection lead to the merge commit.

So I manually rebased those 3 fixes on top of the scsi-next tree (f5b893c94715) 
and indeed one of them broke my box:

  18f6084a989b scsi: mpt3sas: Fix secure erase premature termination

I reverted it from latest upstream (with a minor conflict resolution), and that 
makes my box boot fine again. I have no idea which scsi-next commit this change 
interacted with, and it's not easy to find out so I'm not volunteering! It must 
be 
one of these 256 commits:

   e3a00f68e426..f5b893c94715

Note that reverting the first commit alone does not help:

  7ff723ad0f87 scsi: mpt3sas: Unblock device after controller reset

So it's reverting 18f6084a989b (while keeping ata_12_16_cmd() around to enable 
the 
7ff723ad0f87 fix) that does the trick.

Thanks,

Ingo

>
>From 0734e6d2a7f757172d6b7750d8fcf602909300e6 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mi...@kernel.org>
Date: Sun, 15 Jan 2017 09:59:39 +0100
Subject: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

This reverts commit 18f6084a989ba1b38702f9af37a2e4049a924be6.

 Conflicts:
drivers/scsi/mpt3sas/mpt3sas_scsih.c

Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b5c966e319d3..3573daa2cce8 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4063,13 +4063,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
if (ioc->logging_level & MPT_DEBUG_SCSI)
scsi_print_command(scmd);
 
-   /*
-* Lock the device for any subsequent command until command is
-* done.
-*/
-   if (ata_12_16_cmd(scmd))
-   scsi_internal_device_block(scmd->device);
-
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4650,9 +4643,6 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
-   if (ata_12_16_cmd(scmd))
-   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
-
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)

2016-04-19 Thread Ingo Molnar

* Josh Poimboeuf <jpoim...@redhat.com> wrote:

> On Sat, Apr 16, 2016 at 11:03:32AM +0200, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf <jpoim...@redhat.com> wrote:
> > 
> > > > I don't think we know yet if there's a reliable way to turn the bug off.
> > > > 
> > > > Also, according to the gcc guys, this bug won't always result in a
> > > > truncated function, and may sometimes just make some inline function
> > > > call sites disappear:
> > > > 
> > > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14
> > > > 
> > > > though I haven't been able to confirm that experimentally.  But if it's
> > > > true, that means that objtool won't be able to detect all cases of the
> > > > bug and some function calls may just silently disappear!
> > > > 
> > > > There's a lot of activity in the bug now, so hopefully they'll be able
> > > > to tell us soon if there's a reliable way to avoid it and/or detect it.
> > > > 
> > > > BTW, Denys posted a workaround patch for the qla2 code:
> > > > 
> > > >   
> > > > https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlas...@redhat.com
> > > 
> > > Martin Jambor wrote a succinct summary of the conditions needed for this
> > > bug:
> > > 
> > >   "This bug can occur when an inlineable function containing a call to
> > >   __builtin_constant_p, which checks a parameter or a value it
> > >   references and a (possibly indirect) caller of the function actually
> > >   passes a constant, but stores it using a type of a different size."
> > > 
> > > So to prevent it from happening elsewhere in the kernel, it sounds like
> > > we'd have to either remove all uses of __builtin_constant_p() or disable
> > > inlining completely.
> > > 
> > > There's also no reliable way to detect the bug has occurred, though
> > > objtool will detect it in cases when the function gets truncated.
> > 
> > So it appears to me that due to the hard to detect nature of the GCC bug 
> > the fix 
> > will probably be backported by them, so I think we should be fine with 
> > relying on 
> > objtool to detect weird code sequences in the kernel, and should work 
> > around 
> > specific instances of the bug.
> 
> I agree.  So how should we work around the bug in this case?  There have
> been several suggestions:
> 
> - change wwn_to_u64() to __always_inline
> 
> - change qla2x00_get_host_fabric_name() to skip the unnecessary call to
>   wwn_to_u64()
> 
> - revert one of the two commits:
>   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some 
> byteswap operations")
>   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

The first option sounds like the best one by far: it does a change that is 
related 
to the GCC bug (tweaks inlining), has near zero impact and does not revert 
other 
useful progress.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)

2016-04-16 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> > I don't think we know yet if there's a reliable way to turn the bug off.
> > 
> > Also, according to the gcc guys, this bug won't always result in a
> > truncated function, and may sometimes just make some inline function
> > call sites disappear:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14
> > 
> > though I haven't been able to confirm that experimentally.  But if it's
> > true, that means that objtool won't be able to detect all cases of the
> > bug and some function calls may just silently disappear!
> > 
> > There's a lot of activity in the bug now, so hopefully they'll be able
> > to tell us soon if there's a reliable way to avoid it and/or detect it.
> > 
> > BTW, Denys posted a workaround patch for the qla2 code:
> > 
> >   
> > https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlas...@redhat.com
> 
> Martin Jambor wrote a succinct summary of the conditions needed for this
> bug:
> 
>   "This bug can occur when an inlineable function containing a call to
>   __builtin_constant_p, which checks a parameter or a value it
>   references and a (possibly indirect) caller of the function actually
>   passes a constant, but stores it using a type of a different size."
> 
> So to prevent it from happening elsewhere in the kernel, it sounds like
> we'd have to either remove all uses of __builtin_constant_p() or disable
> inlining completely.
> 
> There's also no reliable way to detect the bug has occurred, though
> objtool will detect it in cases when the function gets truncated.

So it appears to me that due to the hard to detect nature of the GCC bug the 
fix 
will probably be backported by them, so I think we should be fine with relying 
on 
objtool to detect weird code sequences in the kernel, and should work around 
specific instances of the bug.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)

2016-04-14 Thread Ingo Molnar

* Denys Vlasenko  wrote:

> > In fact, the following patch seems to fix it:
> > 
> > diff --git a/include/scsi/scsi_transport_fc.h 
> > b/include/scsi/scsi_transport_fc.h
> > index bf66ea6..56b9e81 100644
> > --- a/include/scsi/scsi_transport_fc.h
> > +++ b/include/scsi/scsi_transport_fc.h
> > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
> > return result;
> >  }
> >  
> > -static inline u64 wwn_to_u64(u8 *wwn)
> > +static __always_inline u64 wwn_to_u64(u8 *wwn)
> >  {
> > return get_unaligned_be64(wwn);
> >  }
> 
> It is not a guarantee.

Of course it's a workaround - but is there any deterministic way to turn off 
this 
GCC bug (by activating some GCC command line switch), or do we have to live 
with 
objtool warning about this GCC?

Which, by the way, is pretty cool!

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Decouple X86_32 dependency from the ISA Kconfig option

2016-04-13 Thread Ingo Molnar

* William Breathitt Gray <vilhelm.g...@gmail.com> wrote:

> On Wed, Apr 13, 2016 at 09:26:02AM +0200, Ingo Molnar wrote:
> >What's the practical motivation of this? What exact hardware is this for?
> >
> >Thanks,
> >
> > Ingo
> 
> The PC/104 bus is equivalent to the ISA bus regarding software
> communication. Many small form factor systems have a PC/104 bus where
> PC/104 cards may be stacked. Nowadays, these systems are commonly
> running 64-bit processors such as the Intel Atom.
> 
> I would like to utilize the ISA bus driver to support these PC/104
> devices (see http://lkml.org/lkml/2016/4/7/418), but the ISA
> configuration option has an arbitrary X86_32 dependency. Decoupling the
> X86_32 dependency from the ISA configuration option will allow these
> PC/104 drivers to build for 64-bit architectures.
> 
> The existing kernel drivers which I intend to utilize the ISA bus driver
> in a X86_64 architecture for PC/104 support are the ACCES 104-DIO-48E
> GPIO driver, the ACCES 104-IDI-48 GPIO driver, the ACCES 104-IDIO-16
> GPIO driver, and the Apex Embedded Systems STX104 DAC driver. I have
> several more PC/104 devices for which I wish to write drivers, but I
> would like to resolve this ISA bus driver situation before submitting
> new code.

Ah, ok, so it's for enabling real hardware, not just a cleanup, right? You 
might 
want to put that info into the boilerplate mail or so.

I'm perfectly fine with all the patches that touch x86 code:

  Acked-by: Ingo Molnar <mi...@kernel.org>

I suppose you'd like to have these in the driver tree, all in one place?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Decouple X86_32 dependency from the ISA Kconfig option

2016-04-13 Thread Ingo Molnar

* William Breathitt Gray  wrote:

> This patchset is based on top of commit 3a3a5fece6f2 ("fs: kernfs: Replace
> CURRENT_TIME by current_fs_time()") of the driver-core-testing branch of
> the driver-core repository.
> 
> The introduction of the ISA_BUS option in commit b3c1be1b789c
> ("base: isa: Remove X86_32 dependency") blocks the compilation of ISA
> drivers on non-x86 platforms. The ISA_BUS configuration option should not
> be necessary if the X86_32 dependency can be decoupled from the ISA
> configuration option. This patchset both removes the ISA_BUS configuration
> option entirely and decouples the X86_32 dependency from the ISA
> configuration option.
> 
> The PNPBIOS driver requires preprocessor defines (located in
> include/asm/segment.h) only declared if the architecture is set to X86_32.
> If the architecture is set to X86_64, the PNPBIOS driver will not build
> properly. The X86 dependecy for the PNPBIOS configuration option is changed
> to an explicit X86_32 dependency in order to prevent an attempt to build
> for an unsupported architecture.
> 
> Changes to the ISA SSCAPE and SCSI ULTRASTOR drivers are also included. The
> relevant patches simply fix format string identifier mismatches exposed
> during an attempted X86_64 compilation after the decoupling of the X86_32
> dependency from the ISA configuration option. These patches fix compilation
> warnings rather than errors, but the solutions were so trivial that I
> decided to include them in this patchset. If it would be inappropriate to
> include them in this patchset, let me know and I will rebase to remove the
> relevant patches.
> 
> William Breathitt Gray (4):
>   pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS
>   sound: isa: sscape: Use correct format identifier for size_t
>   scsi: ultrastor: Use correct format identifier for kernel pointer
>   isa: Remove the ISA_BUS Kconfig option
> 
>  arch/x86/Kconfig| 10 ++
>  drivers/base/Makefile   |  2 +-
>  drivers/pnp/pnpbios/Kconfig |  2 +-
>  drivers/scsi/ultrastor.c|  8 
>  include/linux/isa.h |  2 +-
>  sound/isa/sscape.c  |  2 +-
>  6 files changed, 10 insertions(+), 16 deletions(-)

What's the practical motivation of this? What exact hardware is this for?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] x86: remove vmalloc.h from asm/io.h

2015-05-29 Thread Ingo Molnar

* Stephen Rothwell s...@canb.auug.org.au wrote:

 Nothing in asm/io.h uses anything from vmalloc.h, so remove the include
 and fix up the build problems in an allmodconfig (64 bit and 32 bit)
 build.
 
 This may be the place where x86 builds get vmalloc.h implicitly included
 and that tends to hide places where vmalloc() et al are added to files
 but the include of vmalloc.h is forgotten.

Good idea.

Acked-by: Ingo Molnar mi...@kernel.org

 Based in Linus' tree of today.
 
 There are probably more places that need vmalloc.h included, but this passes 
 64 
 bit and 32 bit allmodconfig builds, so is a place to start.

Please also test x86 allnoconfig and defconfig 32/64, that tends to unearth the 
remaining places. People doing randconfig testing will find the rest.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: spinlock_irqsave() flags (Was: pm80xx: Spinlock fix)

2013-12-24 Thread Ingo Molnar

* Oleg Nesterov o...@redhat.com wrote:

 On 12/23, Ingo Molnar wrote:
 
  * Oleg Nesterov o...@redhat.com wrote:
 
   Initially I thought that this is obviously wrong, irqsave/irqrestore
   assume that flags is owned by the caller, not by the lock. And
   iirc this was certainly wrong in the past.
  
   But when I look at spinlock.c it seems that this code can actually
   work. _irqsave() writes to FLAGS after it takes the lock, and
   _irqrestore() has a copy of FLAGS before it drops this lock.
 
  I don't think that's true: if it was then the lock would not be
  irqsave, a hardware-irq could come in after the lock has been taken
  and before flags are saved+disabled.
 
 I do agree that this pattern is not safe, that is why I decided to ask.
 
 But, unless I missed something, with the current implementation
 spin_lock_irqsave(lock, global_flags) does:
 
   unsigned long local_flags;
 
   local_irq_save(local_flags);
   spin_lock(lock);
 
   global_flags = local_flags;
 
 so the access to global_flags is actually serialized by lock.

You are right, today that's true technically because IIRC due to Sparc 
quirks we happen to return 'flags' as a return value - still it's very 
ugly and it could break anytime if we decide to do more aggressive 
optimizations and actually directly save into 'flags'.

Note that even today there's a narrow exception: on UP we happen to 
build it the other way around, so that we do:

local_irq_save(global_flags);
__acquire(lock);

This does not matter for any real code because on UP there is no 
physical lock and __acquire() is empty code-wise, but any compiler 
driven locking analysis tool using __attribute__ __context__(), if 
built on UP, would see the unsafe locking pattern.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: spinlock_irqsave() flags (Was: pm80xx: Spinlock fix)

2013-12-23 Thread Ingo Molnar

* Oleg Nesterov o...@redhat.com wrote:

 On 12/23, Oleg Nesterov wrote:
 
  Perhaps we should ask the maintainers upstream? Even if this works, I am
  not sure this is _supposed_ to work. I mean, in theory spin_lock_irqave()
  can be changed as, say
 
  #define spin_lock_irqsave(lock, flags)  \
  do {\
  local_irq_save(flags);  \
  spin_lock(lock);\
  } while (0)
 
  (and iirc it was defined this way a long ago). In this case flags is
  obviously not protected.
 
 Yes, lets ask the maintainers.
 
 In short, is this code
 
   spinlock_t LOCK;
   unsigned long FLAGS;
 
   void my_lock(void)
   {
   spin_lock_irqsave(LOCK, FLAGS);
   }
 
   void my_unlock(void)
   {
   spin_unlock_irqrestore(LOCK, FLAGS);
   }
 
 correct or not?
 
 Initially I thought that this is obviously wrong, irqsave/irqrestore 
 assume that flags is owned by the caller, not by the lock. And 
 iirc this was certainly wrong in the past.
 
 But when I look at spinlock.c it seems that this code can actually 
 work. _irqsave() writes to FLAGS after it takes the lock, and 
 _irqrestore() has a copy of FLAGS before it drops this lock.

I don't think that's true: if it was then the lock would not be 
irqsave, a hardware-irq could come in after the lock has been taken 
and before flags are saved+disabled.

So AFAICS this is an unsafe pattern, beyond being ugly as hell.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

 On Sun, Jun 23, 2013 at 12:09 AM, Ingo Molnar mi...@kernel.org wrote:
 
  The spinning approach you add has the disadvantage of actively wasting 
  CPU time, which could be used to run other tasks. In general it's much 
  better to make sure the completion IRQs are rate-limited and just 
  schedule. This (combined with a metric ton of fine details) is what 
  the networking code does in essence, and they have no trouble reaching 
  very high throughput.
 
 It's not about throughput - it's about latency. Don't ever confuse the 
 two, they have almost nothing in common. Networking very very seldom has 
 the kind of submit and wait for immediate result issues that disk 
 reads do.

Yeah, indeed that's true, the dd measurement Matthew did issued IO at a 
rate of one sector at a time and waiting for every sector to complete:

dd if=/dev/nvme0n1 of=/dev/null iflag=direct bs=512 count=100

So my suggestions about batching and IRQ rate control are immaterial...

 That said, I dislike the patch intensely. I do not think it's at all a 
 good idea to look at need_resched to say I can spin now. You're 
 still wasting CPU cycles.
 
 So Willy, please do *not* mix this up with the scheduler, or at least 
 not need_resched. Instead, maybe we should introduce a notion of if 
 we are switching to the idle thread, let's see if we can try to do some 
 IO synchronously.
 
 You could try to do that either *in* the idle thread (which would take 
 the context switch overhead - maybe negating some of the advantages), or 
 alternatively hook into the scheduler idle logic before actually doing 
 the switch.
 
 But anything that starts polling when there are other runnable processes 
 to be done sounds really debatable. Even if it's only 5us or so. 
 There's a lot of real work that could be done in 5us.

I'm wondering, how will this scheme work if the IO completion latency is a 
lot more than the 5 usecs in the testcase? What if it takes 20 usecs or 
100 usecs or more?

Will we still burn our CPU time, wasting power, inflating this CPU's load 
which keeps other CPUs from balancing tasks over to this CPU, etc?

In the 5 usecs case it looks beneficial to do. In the longer-latency cases 
I'm not so sure.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Ingo Molnar

* Jens Axboe ax...@kernel.dk wrote:

 - With the former note, the app either needs to opt in (and hence
   willingly sacrifice CPU cycles of its scheduling slice) or it needs to 
   be nicer in when it gives up and goes back to irq driven IO.

The scheduler could look at sleep latency averages of the task in question 
- we measure that already in most cases.

If the 'average sleep latency' is below a certain threshold, the 
scheduler, if it sees that the CPU is about to go idle, could delay doing 
the context switch and do light idle-polling, for say twice the length 
of the expected sleep latency - assuming the CPU is otherwise idle - 
before it really schedules away the task and the CPU goes idle.

This would still require an IRQ and a wakeup to be taken, but would avoid 
the context switch.

Yet I have an ungood feeling about depending on actual latency values so 
explicitly. There will have to be a cutoff value, and if a workload is 
just below or just above that threshold then behavior will change 
markedly. Such schemes rarely worked out nicely in the past. [Might still 
be worth trying it.]

Couldn't the block device driver itself estimate the expected latency of 
IO completion and simply poll if that's expected to be very short [such as 
there's only a single outstanding IO to a RAM backed device]? IO drivers 
doing some polling and waiting in the microseconds range isnt overly 
controversial. I'd even do that if the CPU is busy otherwise: the task 
should see a proportional slowdown as load increases, with no change in IO 
queueing behavior.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-24 Thread Ingo Molnar

* David Ahern dsah...@gmail.com wrote:

 On 6/23/13 3:09 AM, Ingo Molnar wrote:
 If an IO driver is implemented properly then it will batch up requests for
 the controller, and gets IRQ-notified on a (sub-)batch of buffers
 completed.
 
 If there's any spinning done then it should be NAPI-alike polling: a
 single is stuff completed polling pass per new block of work submitted,
 to opportunistically interleave completion with submission work.
 
 I don't see where active spinning brings would improve performance
 compared to a NAPI-alike technique. Your numbers obviously show a speedup
 we'd like to have, I'm just wondering whether the same speedup (or even
 more) could be implemented via:
 
   - smart batching that rate-limits completion IRQs in essence
   + NAPI-alike polling
 
 ... which would almost never result in IRQ driven completion when we are
 close to CPU-bound and while not yet saturating the IO controller's
 capacity.
 
 The spinning approach you add has the disadvantage of actively wasting CPU
 time, which could be used to run other tasks. In general it's much better
 to make sure the completion IRQs are rate-limited and just schedule. This
 (combined with a metric ton of fine details) is what the networking code
 does in essence, and they have no trouble reaching very high throughput.
 
 Networking code has a similar proposal for low latency sockets using 
 polling: https://lwn.net/Articles/540281/

In that case it might make sense to try the generic approach I suggested 
in the previous mail, which would measure average sleep latencies of 
tasks, and would do light idle-polling instead of the more expensive 
switch-to-the-idle-task context switch plus associated RCU, nohz, etc. 
busy-CPU-tear-down and the symmetric build-up work on idle wakeup.

The IO driver would still have to take an IRQ though, preferably on the 
CPU that runs the IO task ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Allow block drivers to poll for I/O instead of sleeping

2013-06-23 Thread Ingo Molnar

* Matthew Wilcox wi...@linux.intel.com wrote:

 
 A paper at FAST2012
 (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) pointed
 out the performance overhead of taking interrupts for low-latency block
 I/Os.  The solution the author investigated was to spin waiting for each
 I/O to complete.  This is inefficient as Linux submits many I/Os which
 are not latency-sensitive, and even when we do submit latency-sensitive
 I/Os (eg swap-in), we frequently submit several I/Os before waiting.
 
 This RFC takes a different approach, only spinning when we would
 otherwise sleep.  To implement this, I add an 'io_poll' function pointer
 to backing_dev_info.  I include a sample implementation for the NVMe
 driver.  Next, I add an io_wait() function which will call io_poll()
 if it is set.  It falls back to calling io_schedule() if anything goes
 wrong with io_poll() or the task exceeds its timeslice.  Finally, all
 that is left is to judiciously replace calls to io_schedule() with
 calls to io_wait().  I think I've covered the main contenders with
 sleep_on_page(), sleep_on_buffer() and the DIO path.
 
 I've measured the performance benefits of this with a Chatham NVMe
 prototype device and a simple
 # dd if=/dev/nvme0n1 of=/dev/null iflag=direct bs=512 count=100
 The latency of each I/O reduces by about 2.5us (from around 8.0us to
 around 5.5us).  This matches up quite well with the performance numbers
 shown in the FAST2012 paper (which used a similar device).

Nice speedup!

 Is backing_dev_info the right place for this function pointer?
 Have I made any bad assumptions about the correct way to retrieve
 the backing_dev_info from (eg) a struct page, buffer_head or kiocb?
 Should I pass a 'state' into io_wait() instead of retrieving it from
 'current'?  Is kernel/sched/core.c the right place for io_wait()?
 Should it be bdi_wait() instead?  Should it be marked with __sched?
 Where should I add documentation for the io_poll() function pointer?
 
 NB: The NVMe driver piece of this is for illustrative purposes only and
 should not be applied.  I've rearranged the diff so that the interesting
 pieces appear first.


-EMISSINGDIFFSTAT

 diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
 index c388155..97f8fde 100644
 --- a/include/linux/backing-dev.h
 +++ b/include/linux/backing-dev.h
 @@ -68,6 +68,7 @@ struct backing_dev_info {
   unsigned int capabilities; /* Device capabilities */
   congested_fn *congested_fn; /* Function pointer if device is md/dm */
   void *congested_data;   /* Pointer to aux data for congested func */
 + int (*io_poll)(struct backing_dev_info *);
  
   char *name;
  
 @@ -126,6 +127,8 @@ int bdi_has_dirty_io(struct backing_dev_info *bdi);
  void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi);
  void bdi_lock_two(struct bdi_writeback *wb1, struct bdi_writeback *wb2);
  
 +void io_wait(struct backing_dev_info *bdi);
 +
  extern spinlock_t bdi_lock;
  extern struct list_head bdi_list;
  
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 58453b8..4840065 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4527,6 +4527,36 @@ long __sched io_schedule_timeout(long timeout)
   return ret;
  }
  
 +/*
 + * Wait for an I/O to complete against this backing_dev_info.  If the
 + * task exhausts its timeslice polling for completions, call io_schedule()
 + * anyway.  If a signal comes pending, return so the task can handle it.
 + * If the io_poll returns an error, give up and call io_schedule(), but
 + * swallow the error.  We may miss an I/O completion (eg if the interrupt
 + * handler gets to it first).  Guard against this possibility by returning
 + * if we've been set back to TASK_RUNNING.
 + */
 +void io_wait(struct backing_dev_info *bdi)
 +{
 + if (bdi  bdi-io_poll) {
 + long state = current-state;
 + while (!need_resched()) {
 + int ret = bdi-io_poll(bdi);
 + if ((ret  0) || signal_pending_state(state, current)) {
 + set_current_state(TASK_RUNNING);
 + return;
 + }
 + if (current-state == TASK_RUNNING)
 + return;
 + if (ret  0)
 + break;
 + cpu_relax();
 + }
 + }
 +
 + io_schedule();
 +}

I'm wondering why this makes such a performance difference.

If an IO driver is implemented properly then it will batch up requests for 
the controller, and gets IRQ-notified on a (sub-)batch of buffers 
completed.

If there's any spinning done then it should be NAPI-alike polling: a 
single is stuff completed polling pass per new block of work submitted, 
to opportunistically interleave completion with submission work.

I don't see where active spinning brings would improve performance 
compared to a NAPI-alike technique. Your numbers 

Re: [git patches] libata updates

2012-07-26 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

 I couldn't find an example of that in a quick look, it's 
 fairly uncommon to have non-conflicting merges that had 
 semantic - but not contextual - conflicts. [...]

This:

  git log --grep='Semantic merge\|Semantic conflict'

gives over a dozen examples of such semantic merges I've done in 
the past 4 years. I fully agree that they are best done in the 
merge commit - in hindsight perhaps with a tad more explanation 
than I did.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-04 Thread Ingo Molnar

* Jeff Garzik [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 so please tell me Jeff. If Greg, who is the super-maintainer of your 
 code area, and who deals with your code every day and changes it 
 every minute and hour, simply did not Cc: the SCSI list - how am i, a 
 largely outside party in this matter, supposed to notice that 3 
 maintainers and 3 mailing lists in the Cc: were somehow not enough 
 and that i was supposed to grow the already sizable Cc: list even 
 more?

 Because, regardless of the situation, it's both common courtesy and 
 wise practice to CC relevant driver maintainers, when you touch a 
 driver.

 And it's just common sense: Greg simply does not know the intimate 
 details of every PCI driver.  Nor do I.  Nor you.

 In the case of lpfc here, we have an active driver maintainer, and an 
 up-to-date MAINTAINERS entry.  Even if you are too slack to read 
 MAINTAINERS, 'git log' would have given you the same info.

 Don't pretend there is some benefit here to ignoring the people that 
 best know the driver.  I don't buy that; it simply makes no 
 engineering sense whatsoever.

what you _STILL_ do not realize is the following: you still attribute 
the lack of Cc:s to some intention of mine. No, it was not my intention. 
At first glance the Cc: looked large and complete enough in an 
_existing_ discussion and that's was the end of my (brief) attention 
regarding the Cc: line. Yes, it would have been a bit better had i 
noticed the lack of Cc:s in an existing discussion, but i didnt.

[ And it might not surprise you if i observe here that i think this
  little mishap is further (incidental) proof that having this many
  mailing list aliases to get the 'guaranteed attention' of maintainers 
  is just super fragile and does not serve users at all. Even Greg and i 
  got it wrong accidentally. If _we_ get it wrong, who will get it 
  right? I see several mis-Cc:ed emails every day and there's over a 
  1000 unfixed bugs in bugzilla for 2.6 alone. It does not take a genius 
  to observe that something is fundamentally wrong ;-) That 2.6 works on
  your or my box is a _very_ poor metric - we both fix bugs on our 
  systems super fast so we've got a very biased first-hand experience 
  about the stability and reliability of the kernel. We never really 
  feel the kind of frustration that testers feel when their mail to lkml 
  gets ignored. We never really feel the helplessness that comes from 
  unfixed bugs. ]

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Ingo Molnar

* Jeff Garzik [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 ===
 --- linux.orig/drivers/scsi/lpfc/lpfc_init.c
 +++ linux/drivers/scsi/lpfc/lpfc_init.c
 @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
  uint16_t iotag;
  int bars = pci_select_bars(pdev, IORESOURCE_MEM);
  -   if (pci_enable_device_bars(pdev, bars))
 +if (pci_enable_device_io(pdev))
  goto out;

 Look at the line right above it...  AFAICS you want 
 pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.

 Also a CC to linux-scsi and the driver author would be nice, as they 
 are the ones with hardware and can verify.

it would have been totally appropriate for me to just send a mail to 
lkml with the proper subject line about the breakage. (I might even have 
decided to stay completely silent about the issue and fix it for my own 
build, letting you guys figure it out.)

Instead i did a search of lkml (based on the function name in the build 
error) and figured out where the pull request was on lkml: Greg. I 
replied to that mail, he'll obviously know whom else to Cc from that 
point on (if anyone). I really didnt want to (nor did i need to) figure 
out whether this was some general driver level API change that happen 
kernel-wide, or some SCSI specific change. I simply replied to the pull 
request whose Cc: line seemed well-populated to me already. I also took 
a look at the commit itself and did a quick hack in a hurry to keep the 
tests rolling. It really did not occur to me that i should have added 
anyone else to the Cc: line, as [EMAIL PROTECTED] was 
Cc:-ed already so i assumed the interest was from that angle.

( And as this was spent from my family's weekend time and i had no time
  and no interest to dig any further than to figure out the first hop
  of the change that broke the build, and the parties who initiated that
  hop. I'm in fact surprised that your and James's answer to my
  bugreport is hostility. )

So i find your suggestion that i should have added more people to the 
Cc: line unfair on several levels.

 This set of changes seemed like 50% guesswork to me, without 
 consulting the authors :( And unlike many changes, you actually have 
 to know the hardware [or get clues from surrounding code] to make the 
 change.

you mean the whole set of changes? Or just mine? Mine was a 30 seconds 
guesswork of course, i dont have the hardware, i didnt do the change, 
nor did i do anything near that code - i just saw the build failure stop 
my testing and sent in this notification and a hack. That's why i sent 
it to Greg, as a reply to the mail where he pushed these changes 
upstream and who'll know what to do with it from that point on. My patch 
can be totally thrown away (and should be, apparently).

but ... i guess next time i'll think twice before sending any bugreports 
about or related to the SCSI code anywhere, unless they become really 
annoying. Who needs this hassle?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Ingo Molnar

* Jeff Garzik [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 it would have been totally appropriate for me to just send a mail to lkml 
 with the proper subject line about the breakage. (I might even have 
 decided to stay completely silent about the issue and fix it for my own 
 build, letting you guys figure it out.)

 Oh come on...  You are smart enough to know to at least CC the driver 
 maintainer, the key POC who should be aware of breakage of their 
 driver.  That is a standard courtesy.

is there any particular reason why you cut out the most relevant part of 
my reply, which happens to answer all your questions AFAICS:

 Instead i did a search of lkml (based on the function name in the 
 build error) and figured out where the pull request was on lkml: 
 Greg. I replied to that mail, he'll obviously know whom else to Cc 
 from that point on (if anyone). I really didnt want to (nor did i 
 need to) figure out whether this was some general driver level API 
 change that happen kernel-wide, or some SCSI specific change. I 
 simply replied to the pull request whose Cc: line seemed 
 well-populated to me already. I also took a look at the commit itself 
 and did a quick hack in a hurry to keep the tests rolling. It really 
 did not occur to me that i should have added anyone else to the Cc: 
 line, as [EMAIL PROTECTED] was Cc:-ed already so i 
 assumed the interest was from that angle.

had you read this portion you'd have realized that i did not search for 
any owner of the file, i simply searched for the person who introduced 
the change, and the on-lkml mail where the change was introduced.

and that's all that should be needed, really. Believe me, i hit tons of 
bugs all across the kernel, often several bugs a day, and it's hard even 
for me to figure out who maintains a file and when. (and in Linux 
there's no ownership of files anyway) So as a general rule i go after 
changes instead, and that's exactly what i did here too. I do 
delta/regression QA - i.e. i watch for _changes_ that break the kernel 
and hence the general 'owner' of a file is often irrelevant - it's the 
maintainer who introduces a change who matters, and we do lots of 
cross-maintain merges. Only if i do not manage to identify a change do i 
try to figure out who maintains a file at that given moment. (But those 
mails often go into black holes, they get bounced, subscriber-required 
email lists, etc. etc.) It's also nontrivial to map the files to the 
MAINTAINERS file, and it's also quite outdated in some portions. So the 
MAINTAINERS file is the last resort i use.

so i'm still totally befuddled why you think that there was anything 
particularly wrong or unhelpful about me replying to the specific pull 
request that introduced a particular breakage into the kernel. Had i 
mailed to lkml with a terse kernel build broke message with just an 
URL to a config and the build breakage, you could rightfully have 
complained that i should have done more to properly direct my bugreport. 
But this breakage was about a PCI API change, the pull request had a PCI 
mailing list Cc:-ed, why should i have thought that this needs the 
attention of any other parties?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-12-07 Thread Ingo Molnar

* Bob Tracy [EMAIL PROTECTED] wrote:

  Current state of the source tree is the 6f37ac... version, so I'll 
  start backing out the above diffs in related groups and continue 
  until I've got a working kernel.  For lack of an obvious target, 
  I'll start with the seemingly innocuous change to sysctl_check.c.  
  I'll report back when I've got something.
 
 That was quick :-).  Backing out the sysctl_check.c diff gives me a 
 working kernel.  Beats the [EMAIL PROTECTED] out of me how/why, though.
 
 Michael Cree: could you try backing out the diff below from your 
 2.6.24-rc3 tree and see if things are now working for you?
 
 Here's uname -a, just to confirm (maybe) I'm running on what I say 
 works:
 
 Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 alpha
 
 Here's the diff I backed out (patch -R).  It's short...
 
 diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
 index 5a2f2b2..4abc6d2 100644
 --- a/kernel/sysctl_check.c
 +++ b/kernel/sysctl_check.c
 @@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = {
   { NET_ROSE, rose, trans_net_rose_table },
   { NET_IPV6, ipv6, trans_net_ipv6_table },
   { NET_X25,  x25,  trans_net_x25_table },
 - { NET_TR,   tr,   trans_net_tr_table },
 + { NET_TR,   token-ring,   trans_net_tr_table },
   { NET_DECNET,   decnet,   trans_net_decnet_table },
   /*  NET_ECONET not used */
   { NET_SCTP, sctp, trans_net_sctp_table },

reverting this makes the kernel image shorter by 8 bytes - so perhaps 
some alignment issue somewhere? Or something gets overflown? Does any of 
this get actually used by your bootup?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-12-07 Thread Ingo Molnar

* Bob Tracy [EMAIL PROTECTED] wrote:

  I'm struggling to see how any of those could have broken block 
  device mounting on alpha.  Are you sure you bisected right?
 
 Based on what's in that commit, it *does* appear something went wrong 
 with bisection.  If the implicated commit is the next one in time 
 sequence relative to
 
 # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap 
 INLINE and name timeval_cmp better
 
 then the test of whether I bisected correctly is as simple as applying 
 the commit and seeing if things break, because I'm running on the 
 kernel corresponding to 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right 
 now.  Let me give that a try and I'll report back.  Worst case, I'll 
 have to start over and write off the past four days...

generally it's easier to just go back in time and re-try the last 
known good and last-known bad commit IDs to establish that they are 
indeed correctly identified. if they are not then step back one more in 
the bisection log. No need to spend another 4 days on this, if most of 
the bisection is OK. You can replay a corrected git bisection log 
quickly, by doing:

  git-bisect reset
  git-bisect  bisect.log

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-12-07 Thread Ingo Molnar

* Bob Tracy [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
  
  * Bob Tracy [EMAIL PROTECTED] wrote:
  
Current state of the source tree is the 6f37ac... version, so I'll 
start backing out the above diffs in related groups and continue 
until I've got a working kernel.  For lack of an obvious target, 
I'll start with the seemingly innocuous change to sysctl_check.c.  
I'll report back when I've got something.
   
   That was quick :-).  Backing out the sysctl_check.c diff gives me a 
   working kernel.  Beats the [EMAIL PROTECTED] out of me how/why, though.
   
   Michael Cree: could you try backing out the diff below from your 
   2.6.24-rc3 tree and see if things are now working for you?
   
   Here's uname -a, just to confirm (maybe) I'm running on what I say 
   works:
   
   Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 
   alpha
   
   Here's the diff I backed out (patch -R).  It's short...
   
   diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
   index 5a2f2b2..4abc6d2 100644
   --- a/kernel/sysctl_check.c
   +++ b/kernel/sysctl_check.c
   @@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = {
 { NET_ROSE, rose, trans_net_rose_table },
 { NET_IPV6, ipv6, trans_net_ipv6_table },
 { NET_X25,  x25,  trans_net_x25_table },
   - { NET_TR,   tr,   trans_net_tr_table },
   + { NET_TR,   token-ring,   trans_net_tr_table },
 { NET_DECNET,   decnet,   trans_net_decnet_table },
 /*  NET_ECONET not used */
 { NET_SCTP, sctp, trans_net_sctp_table },
  
  reverting this makes the kernel image shorter by 8 bytes - so 
  perhaps some alignment issue somewhere? Or something gets overflown? 
  Does any of this get actually used by your bootup?
 
 Dunno...  The dmesg output is not terribly useful here, because most 
 of the interesting stuff concerning udev startup that appears on the 
 console never makes it into a log.  Note that, for the bad cases, I 
 don't see the same console output that Michael reported, although the 
 net effect is the same: the partitions don't get found, so I'm offered 
 the chance to enter my root password and do some poking around, and 
 when I do, none of the block devices are present under /dev.
 
 I'm open to suggestions on how to take this analysis further.  Michael 
 indicated he's running a conference this week, so I don't know when 
 he'll be able to come up for air.

i'm not sure how to do direct debugging on udev, so i can only guess 
about what effect on the kernel side could have caused this. One bad 
hack would be to probe udevd's behavior by changing the NET_TR entry 
in various ways:

  tr - token-ring # breaks
  tr - tr # works
  tr - token-rin0 # ?(1)
  tr - TR # ?(2)

the question is, does tweak (1) and tweak (2) work or break?

but it would be a lot more effective i guess to get some udevd expert's 
attention on this ...

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-12-07 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

  then the test of whether I bisected correctly is as simple as 
  applying the commit and seeing if things break, because I'm running 
  on the kernel corresponding to 
  2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right now.  Let me give 
  that a try and I'll report back.  Worst case, I'll have to start 
  over and write off the past four days...
 
 Gad.  I trust the second time will be faster.
 
 git-bisect _is_ very error prone.  I find one of the problems is that 
 each step is so far apart in time that you forget what you were doing.  
 Did I remember to test that iteration?  Did I install the right 
 kernel?  etc.

i have a fully automated bootup-hang bisection script. It is based on 
git-bisect run. I run the script, it builds and boots kernels fully 
automatically, and when the bootup fails (the script notices that via 
the serial log, which it continuously watches - or via a timeout, if the 
system does not come up within 10 minutes it's a bad kernel), the 
script raises my attention via a beep and i power cycle the test box. 
(yeah, i should make use of a managed power outlet to 100% automate it) 

So i dont have to a single manual decision anytime during the bisection. 
But the scripts are very much tied to my ad-hoc test environment so it 
would not be of much general use.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha

2007-12-04 Thread Ingo Molnar

* Bob Tracy [EMAIL PROTECTED] wrote:

 Finally got back in town.  Starting the git-bisect process.  I've got 
 a relatively slow network connection, and the PWS 433au isn't exactly 
 what I would call fast by modern standards, so bear with me while I 
 get things set up and crank through this.  The clone of the 2.6 tree 
 will take several more hours to finish downloading.  I anticipate the 
 best pace I'll be able to manage after that is two iterations in a 24- 
 hour period.

once you are done with the download of the initial cloned git repository 
(which is 200MB+), all the bisection steps will be local and you'll be 
only limited by kernel rebuild speed and by bootup and testing speed, 
not by network bandwidth.

( once you have the cloned repository i'd suggest for you to keep it - 
  that way you can track susequent kernels via git-pull and it uses a 
  very network-efficient delta protocol. )

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-12-04 Thread Ingo Molnar

* Kamalesh Babulal [EMAIL PROTECTED] wrote:

  So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not?
 
 Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4.

just to make sure: this is a real lockup and failed bootup (or device 
init), not just a message, right?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-12-04 Thread Ingo Molnar

* Kamalesh Babulal [EMAIL PROTECTED] wrote:

 Hi Ingo,
 
 This softlockup is seen in the 2.6.24-rc4 either and looks like a 
 message because this is seen while running tbench and machine 
 continues running other test's after the softlockup messages and some 
 times seen with the bootup, but the machines reaches the login prompt 
 and able to continue running tests.

do you know whether there's any true delay when this happens, or is it a 
pure softlockup-detector false positive?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/2] 2.6.23-rc2: known regressions with patches

2007-08-05 Thread Ingo Molnar

* Michal Piotrowski [EMAIL PROTECTED] wrote:

 Memory management
 
 Subject : [bug] SLUB  freeing locks
 References  : http://lkml.org/lkml/2007/7/26/90
 Last known good : ?
 Submitter   : Ingo Molnar [EMAIL PROTECTED]
 Caused-By   : ?
 Handled-By  : Peter Zijlstra [EMAIL PROTECTED]
 Patch   : http://lkml.org/lkml/2007/7/26/97
 Status  : patch available

fixed by commit 2208b764c14d0f1ad63da64b1a42db6077b6fe42.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove name length check in a workqueue

2005-08-10 Thread Ingo Molnar

yeah ... cannot remember why i have done it originally :-|

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo

On Wed, 10 Aug 2005, James Bottomley wrote:

 Ingo,
 
 This has been in the workqueue code in day one, for no real reason that
 I can see.  We just tripped over it in SCSI because the fibre channel
 transport class creates one workqueue per host with the name scsi_wq_%d
 which trips this after we get to 100.  Unfortunately we just came across
 someone with  100 host adapters ...
 
 I think the solution is just to get rid of the artificial limit.
 
 James
 
 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -308,8 +308,6 @@ struct workqueue_struct *__create_workqu
   struct workqueue_struct *wq;
   struct task_struct *p;
  
 - BUG_ON(strlen(name)  10);
 -
   wq = kmalloc(sizeof(*wq), GFP_KERNEL);
   if (!wq)
   return NULL;
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html