Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution
* Alan Coxwrote: > On Mon, 8 Jan 2018 11:08:36 +0100 > Peter Zijlstra wrote: > > > On Fri, Jan 05, 2018 at 10:30:16PM -0800, Dan Williams wrote: > > > On Fri, Jan 5, 2018 at 6:22 PM, Eric W. Biederman > > > wrote: > > > > In at least one place (mpls) you are patching a fast path. Compile out > > > > or don't load mpls by all means. But it is not acceptable to change the > > > > fast path without even considering performance. > > > > > > Performance matters greatly, but I need help to identify a workload > > > that is representative for this fast path to see what, if any, impact > > > is incurred. Even better is a review that says "nope, 'index' is not > > > subject to arbitrary userspace control at this point, drop the patch." > > > > I think we're focussing a little too much on pure userspace. That is, we > > should be saying under the attackers control. Inbound network packets > > could equally be under the attackers control. > > Inbound network packets don't come with a facility to read back and do > cache timimg. [...] But the reply packets can be measured on the sending side, and the total delay timing would thus carry the timing information. Yes, a lot of noise gets added that way if we think 'packet goes through the Internet' - but with gigabit local network access or even through localhost access a lot of noise can be removed as well. It's not as dangerous as a near instantaneous local attack, but 'needs a day of runtime to brute-force through localhost or 10GigE' is still worrying in many real-world security contexts. So I concur with Peter that we should generally consider making all of our responses to external data (maybe with the exception of pigeon post messages) Spectre-safe. Thanks, Ingo
Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
* Martin K. Petersen <martin.peter...@oracle.com> wrote: > >>>>> "James" == James Bottomley <james.bottom...@hansenpartnership.com> > >>>>> writes: > > James> Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough > James> commands > > James> mpt3sas has a firmware failure where it can only handle one pass > James> through ATA command at a time. If another comes in, contrary to > James> the SAT standard, it will hang until the first one completes > James> (causing long commands like secure erase to timeout). The > James> original fix was to block the device when an ATA command came in, > James> but this caused a regression with > > Broadcom folks: Please test and ack as soon as possible so we can get > this fix queued up. > > Ingo: Since you appear to have hardware, it would be great if you could > test James' v3 (https://patchwork.kernel.org/patch/9519383/). Sorry for > the inconvenience. As per the interdiff below v2->v3 did not change the code in any way, only the name of the function and a comment, so you can add this to v3 as well: Reported-by: Ingo Molnar <mi...@kernel.org> Tested-by: Ingo Molnar <mi...@kernel.org> Thanks, Ingo diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 6f9b4c051e4d..830e2c10ba02 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3899,7 +3899,7 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc, } } -static int set_satl_pending(struct scsi_cmnd *scmd, bool pending) +static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending) { struct MPT3SAS_DEVICE *priv = scmd->device->hostdata; @@ -3934,7 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) if (!scmd) continue; count++; - set_satl_pending(scmd, false); + _scsih_set_satl_pending(scmd, false); mpt3sas_base_free_smid(ioc, smid); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery) @@ -4084,7 +4084,9 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) } /* -* Bug work around for firmware SATL handling +* Bug work around for firmware SATL handling. The loop +* is based on atomic operations and ensures consistency +* since we're lockless at this point */ do { if (sas_device_priv_data->ata_command_pending) { @@ -4092,7 +4094,7 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) scmd->scsi_done(scmd); return 0; } - } while (set_satl_pending(scmd, true)); + } while (_scsih_set_satl_pending(scmd, true)); sas_target_priv_data = sas_device_priv_data->sas_target; @@ -4661,7 +4663,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; - set_satl_pending(scmd, false); + _scsih_set_satl_pending(scmd, false); mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"
* James Bottomley <james.bottom...@hansenpartnership.com> wrote: > On Mon, 2017-01-16 at 10:22 +0100, Ingo Molnar wrote: > > * James Bottomley <james.bottom...@hansenpartnership.com> wrote: > > > > > On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote: > > > > So there's a new mpt3sas SCSI driver boot regression, introduced > > > > in > > > > this merge window, which made one of my servers unbootable. > > > > > > We're not reverting a fix that would cause regressions for others. > > > > You really need to reconsider that stance ... > > > > > However, The fix was manifestly wrong, so does this fix of the fix > > > work for you: > > > > > > http://marc.info/?l=linux-scsi=148329237807604 > > > > > > It's been languishing a bit because no-one seemed to care enough to > > > test or review it. IOf you can add a tested by, that will give the > > > two > > > we need to push it. > > > > I have tested your other patch that you pointed to: > > > > http://marc.info/?l=linux-scsi=148449968522828 > > > > Which patch fixes the bug too (I removed my revert first) - so you > > can add my: > > > > Reported-by: Ingo Molnar <mi...@kernel.org> > > Tested-by: Ingo Molnar <mi...@kernel.org> > > Thanks ... just checking you tested the second version with the > concurrency part? I tested the one I linked to: http://marc.info/?l=linux-scsi=148449968522828 I don't know which version that is exactly, I just tested what you asked me to test. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"
* James Bottomley <james.bottom...@hansenpartnership.com> wrote: > On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote: > > So there's a new mpt3sas SCSI driver boot regression, introduced in > > this merge window, which made one of my servers unbootable. > > We're not reverting a fix that would cause regressions for others. You really need to reconsider that stance ... > However, The fix was manifestly wrong, so does this fix of the fix work for > you: > > http://marc.info/?l=linux-scsi=148329237807604 > > It's been languishing a bit because no-one seemed to care enough to > test or review it. IOf you can add a tested by, that will give the two > we need to push it. I have tested your other patch that you pointed to: http://marc.info/?l=linux-scsi=148449968522828 Which patch fixes the bug too (I removed my revert first) - so you can add my: Reported-by: Ingo Molnar <mi...@kernel.org> Tested-by: Ingo Molnar <mi...@kernel.org> BTW., is it wise to work around the out of spec firmware in the mpt3sas code and leave the overly optimistic assumptions in the SCSI code intact? The problem is that other SCSI hardware could be affected as well - and especially enterprise class server hardware has long testing and thus regression latencies (as my example proves). Wouldn't it be more robust to only submit one pass-through command at a time from the SCSI layer, and maybe opt-in hardware that is known to implement the SAT standard fully? (But I'm just kibitzing here really.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"
nux-foundation.org> Date: Wed Dec 14 10:49:33 2016 -0800 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... which is a head-scratcher, so I double checked the key bisection points, but the bisection result is robust. I also re-created Linus's merge and double checked the conflict resolution - which looks fine as well. After (much) more testing it turns out that the bug is some sort of combination bug, in that scsi-next didn't have all the SCSI fixes that upstream already had, in particular it didn't have these commits: 7ff723ad0f87 scsi: mpt3sas: Unblock device after controller reset 18f6084a989b scsi: mpt3sas: Fix secure erase premature termination 6d3a56ed0985 scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk When Linus pulled in scsi-next-minus-fixes these two sets of commits combined and produced the regression - and made the bisection lead to the merge commit. So I manually rebased those 3 fixes on top of the scsi-next tree (f5b893c94715) and indeed one of them broke my box: 18f6084a989b scsi: mpt3sas: Fix secure erase premature termination I reverted it from latest upstream (with a minor conflict resolution), and that makes my box boot fine again. I have no idea which scsi-next commit this change interacted with, and it's not easy to find out so I'm not volunteering! It must be one of these 256 commits: e3a00f68e426..f5b893c94715 Note that reverting the first commit alone does not help: 7ff723ad0f87 scsi: mpt3sas: Unblock device after controller reset So it's reverting 18f6084a989b (while keeping ata_12_16_cmd() around to enable the 7ff723ad0f87 fix) that does the trick. Thanks, Ingo > >From 0734e6d2a7f757172d6b7750d8fcf602909300e6 Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mi...@kernel.org> Date: Sun, 15 Jan 2017 09:59:39 +0100 Subject: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination" This reverts commit 18f6084a989ba1b38702f9af37a2e4049a924be6. Conflicts: drivers/scsi/mpt3sas/mpt3sas_scsih.c Signed-off-by: Ingo Molnar <mi...@kernel.org> --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index b5c966e319d3..3573daa2cce8 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -4063,13 +4063,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) if (ioc->logging_level & MPT_DEBUG_SCSI) scsi_print_command(scmd); - /* -* Lock the device for any subsequent command until command is -* done. -*/ - if (ata_12_16_cmd(scmd)) - scsi_internal_device_block(scmd->device); - sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { scmd->result = DID_NO_CONNECT << 16; @@ -4650,9 +4643,6 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; - if (ata_12_16_cmd(scmd)) - scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); - mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); if (mpi_reply == NULL) { -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
* Josh Poimboeuf <jpoim...@redhat.com> wrote: > On Sat, Apr 16, 2016 at 11:03:32AM +0200, Ingo Molnar wrote: > > > > * Josh Poimboeuf <jpoim...@redhat.com> wrote: > > > > > > I don't think we know yet if there's a reliable way to turn the bug off. > > > > > > > > Also, according to the gcc guys, this bug won't always result in a > > > > truncated function, and may sometimes just make some inline function > > > > call sites disappear: > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14 > > > > > > > > though I haven't been able to confirm that experimentally. But if it's > > > > true, that means that objtool won't be able to detect all cases of the > > > > bug and some function calls may just silently disappear! > > > > > > > > There's a lot of activity in the bug now, so hopefully they'll be able > > > > to tell us soon if there's a reliable way to avoid it and/or detect it. > > > > > > > > BTW, Denys posted a workaround patch for the qla2 code: > > > > > > > > > > > > https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlas...@redhat.com > > > > > > Martin Jambor wrote a succinct summary of the conditions needed for this > > > bug: > > > > > > "This bug can occur when an inlineable function containing a call to > > > __builtin_constant_p, which checks a parameter or a value it > > > references and a (possibly indirect) caller of the function actually > > > passes a constant, but stores it using a type of a different size." > > > > > > So to prevent it from happening elsewhere in the kernel, it sounds like > > > we'd have to either remove all uses of __builtin_constant_p() or disable > > > inlining completely. > > > > > > There's also no reliable way to detect the bug has occurred, though > > > objtool will detect it in cases when the function gets truncated. > > > > So it appears to me that due to the hard to detect nature of the GCC bug > > the fix > > will probably be backported by them, so I think we should be fine with > > relying on > > objtool to detect weird code sequences in the kernel, and should work > > around > > specific instances of the bug. > > I agree. So how should we work around the bug in this case? There have > been several suggestions: > > - change wwn_to_u64() to __always_inline > > - change qla2x00_get_host_fabric_name() to skip the unnecessary call to > wwn_to_u64() > > - revert one of the two commits: > bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some > byteswap operations") > ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") The first option sounds like the best one by far: it does a change that is related to the GCC bug (tweaks inlining), has near zero impact and does not revert other useful progress. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
* Josh Poimboeufwrote: > > I don't think we know yet if there's a reliable way to turn the bug off. > > > > Also, according to the gcc guys, this bug won't always result in a > > truncated function, and may sometimes just make some inline function > > call sites disappear: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c14 > > > > though I haven't been able to confirm that experimentally. But if it's > > true, that means that objtool won't be able to detect all cases of the > > bug and some function calls may just silently disappear! > > > > There's a lot of activity in the bug now, so hopefully they'll be able > > to tell us soon if there's a reliable way to avoid it and/or detect it. > > > > BTW, Denys posted a workaround patch for the qla2 code: > > > > > > https://lkml.kernel.org/r/1460716583-15673-1-git-send-email-dvlas...@redhat.com > > Martin Jambor wrote a succinct summary of the conditions needed for this > bug: > > "This bug can occur when an inlineable function containing a call to > __builtin_constant_p, which checks a parameter or a value it > references and a (possibly indirect) caller of the function actually > passes a constant, but stores it using a type of a different size." > > So to prevent it from happening elsewhere in the kernel, it sounds like > we'd have to either remove all uses of __builtin_constant_p() or disable > inlining completely. > > There's also no reliable way to detect the bug has occurred, though > objtool will detect it in cases when the function gets truncated. So it appears to me that due to the hard to detect nature of the GCC bug the fix will probably be backported by them, so I think we should be fine with relying on objtool to detect weird code sequences in the kernel, and should work around specific instances of the bug. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: This patch triggers a bad gcc bug (was Re: [PATCH] force inlining of some byteswap operations)
* Denys Vlasenkowrote: > > In fact, the following patch seems to fix it: > > > > diff --git a/include/scsi/scsi_transport_fc.h > > b/include/scsi/scsi_transport_fc.h > > index bf66ea6..56b9e81 100644 > > --- a/include/scsi/scsi_transport_fc.h > > +++ b/include/scsi/scsi_transport_fc.h > > @@ -796,7 +796,7 @@ fc_remote_port_chkready(struct fc_rport *rport) > > return result; > > } > > > > -static inline u64 wwn_to_u64(u8 *wwn) > > +static __always_inline u64 wwn_to_u64(u8 *wwn) > > { > > return get_unaligned_be64(wwn); > > } > > It is not a guarantee. Of course it's a workaround - but is there any deterministic way to turn off this GCC bug (by activating some GCC command line switch), or do we have to live with objtool warning about this GCC? Which, by the way, is pretty cool! Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Decouple X86_32 dependency from the ISA Kconfig option
* William Breathitt Gray <vilhelm.g...@gmail.com> wrote: > On Wed, Apr 13, 2016 at 09:26:02AM +0200, Ingo Molnar wrote: > >What's the practical motivation of this? What exact hardware is this for? > > > >Thanks, > > > > Ingo > > The PC/104 bus is equivalent to the ISA bus regarding software > communication. Many small form factor systems have a PC/104 bus where > PC/104 cards may be stacked. Nowadays, these systems are commonly > running 64-bit processors such as the Intel Atom. > > I would like to utilize the ISA bus driver to support these PC/104 > devices (see http://lkml.org/lkml/2016/4/7/418), but the ISA > configuration option has an arbitrary X86_32 dependency. Decoupling the > X86_32 dependency from the ISA configuration option will allow these > PC/104 drivers to build for 64-bit architectures. > > The existing kernel drivers which I intend to utilize the ISA bus driver > in a X86_64 architecture for PC/104 support are the ACCES 104-DIO-48E > GPIO driver, the ACCES 104-IDI-48 GPIO driver, the ACCES 104-IDIO-16 > GPIO driver, and the Apex Embedded Systems STX104 DAC driver. I have > several more PC/104 devices for which I wish to write drivers, but I > would like to resolve this ISA bus driver situation before submitting > new code. Ah, ok, so it's for enabling real hardware, not just a cleanup, right? You might want to put that info into the boilerplate mail or so. I'm perfectly fine with all the patches that touch x86 code: Acked-by: Ingo Molnar <mi...@kernel.org> I suppose you'd like to have these in the driver tree, all in one place? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Decouple X86_32 dependency from the ISA Kconfig option
* William Breathitt Graywrote: > This patchset is based on top of commit 3a3a5fece6f2 ("fs: kernfs: Replace > CURRENT_TIME by current_fs_time()") of the driver-core-testing branch of > the driver-core repository. > > The introduction of the ISA_BUS option in commit b3c1be1b789c > ("base: isa: Remove X86_32 dependency") blocks the compilation of ISA > drivers on non-x86 platforms. The ISA_BUS configuration option should not > be necessary if the X86_32 dependency can be decoupled from the ISA > configuration option. This patchset both removes the ISA_BUS configuration > option entirely and decouples the X86_32 dependency from the ISA > configuration option. > > The PNPBIOS driver requires preprocessor defines (located in > include/asm/segment.h) only declared if the architecture is set to X86_32. > If the architecture is set to X86_64, the PNPBIOS driver will not build > properly. The X86 dependecy for the PNPBIOS configuration option is changed > to an explicit X86_32 dependency in order to prevent an attempt to build > for an unsupported architecture. > > Changes to the ISA SSCAPE and SCSI ULTRASTOR drivers are also included. The > relevant patches simply fix format string identifier mismatches exposed > during an attempted X86_64 compilation after the decoupling of the X86_32 > dependency from the ISA configuration option. These patches fix compilation > warnings rather than errors, but the solutions were so trivial that I > decided to include them in this patchset. If it would be inappropriate to > include them in this patchset, let me know and I will rebase to remove the > relevant patches. > > William Breathitt Gray (4): > pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS > sound: isa: sscape: Use correct format identifier for size_t > scsi: ultrastor: Use correct format identifier for kernel pointer > isa: Remove the ISA_BUS Kconfig option > > arch/x86/Kconfig| 10 ++ > drivers/base/Makefile | 2 +- > drivers/pnp/pnpbios/Kconfig | 2 +- > drivers/scsi/ultrastor.c| 8 > include/linux/isa.h | 2 +- > sound/isa/sscape.c | 2 +- > 6 files changed, 10 insertions(+), 16 deletions(-) What's the practical motivation of this? What exact hardware is this for? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] x86: remove vmalloc.h from asm/io.h
* Stephen Rothwell s...@canb.auug.org.au wrote: Nothing in asm/io.h uses anything from vmalloc.h, so remove the include and fix up the build problems in an allmodconfig (64 bit and 32 bit) build. This may be the place where x86 builds get vmalloc.h implicitly included and that tends to hide places where vmalloc() et al are added to files but the include of vmalloc.h is forgotten. Good idea. Acked-by: Ingo Molnar mi...@kernel.org Based in Linus' tree of today. There are probably more places that need vmalloc.h included, but this passes 64 bit and 32 bit allmodconfig builds, so is a place to start. Please also test x86 allnoconfig and defconfig 32/64, that tends to unearth the remaining places. People doing randconfig testing will find the rest. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: spinlock_irqsave() flags (Was: pm80xx: Spinlock fix)
* Oleg Nesterov o...@redhat.com wrote: On 12/23, Ingo Molnar wrote: * Oleg Nesterov o...@redhat.com wrote: Initially I thought that this is obviously wrong, irqsave/irqrestore assume that flags is owned by the caller, not by the lock. And iirc this was certainly wrong in the past. But when I look at spinlock.c it seems that this code can actually work. _irqsave() writes to FLAGS after it takes the lock, and _irqrestore() has a copy of FLAGS before it drops this lock. I don't think that's true: if it was then the lock would not be irqsave, a hardware-irq could come in after the lock has been taken and before flags are saved+disabled. I do agree that this pattern is not safe, that is why I decided to ask. But, unless I missed something, with the current implementation spin_lock_irqsave(lock, global_flags) does: unsigned long local_flags; local_irq_save(local_flags); spin_lock(lock); global_flags = local_flags; so the access to global_flags is actually serialized by lock. You are right, today that's true technically because IIRC due to Sparc quirks we happen to return 'flags' as a return value - still it's very ugly and it could break anytime if we decide to do more aggressive optimizations and actually directly save into 'flags'. Note that even today there's a narrow exception: on UP we happen to build it the other way around, so that we do: local_irq_save(global_flags); __acquire(lock); This does not matter for any real code because on UP there is no physical lock and __acquire() is empty code-wise, but any compiler driven locking analysis tool using __attribute__ __context__(), if built on UP, would see the unsafe locking pattern. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: spinlock_irqsave() flags (Was: pm80xx: Spinlock fix)
* Oleg Nesterov o...@redhat.com wrote: On 12/23, Oleg Nesterov wrote: Perhaps we should ask the maintainers upstream? Even if this works, I am not sure this is _supposed_ to work. I mean, in theory spin_lock_irqave() can be changed as, say #define spin_lock_irqsave(lock, flags) \ do {\ local_irq_save(flags); \ spin_lock(lock);\ } while (0) (and iirc it was defined this way a long ago). In this case flags is obviously not protected. Yes, lets ask the maintainers. In short, is this code spinlock_t LOCK; unsigned long FLAGS; void my_lock(void) { spin_lock_irqsave(LOCK, FLAGS); } void my_unlock(void) { spin_unlock_irqrestore(LOCK, FLAGS); } correct or not? Initially I thought that this is obviously wrong, irqsave/irqrestore assume that flags is owned by the caller, not by the lock. And iirc this was certainly wrong in the past. But when I look at spinlock.c it seems that this code can actually work. _irqsave() writes to FLAGS after it takes the lock, and _irqrestore() has a copy of FLAGS before it drops this lock. I don't think that's true: if it was then the lock would not be irqsave, a hardware-irq could come in after the lock has been taken and before flags are saved+disabled. So AFAICS this is an unsafe pattern, beyond being ugly as hell. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
* Linus Torvalds torva...@linux-foundation.org wrote: On Sun, Jun 23, 2013 at 12:09 AM, Ingo Molnar mi...@kernel.org wrote: The spinning approach you add has the disadvantage of actively wasting CPU time, which could be used to run other tasks. In general it's much better to make sure the completion IRQs are rate-limited and just schedule. This (combined with a metric ton of fine details) is what the networking code does in essence, and they have no trouble reaching very high throughput. It's not about throughput - it's about latency. Don't ever confuse the two, they have almost nothing in common. Networking very very seldom has the kind of submit and wait for immediate result issues that disk reads do. Yeah, indeed that's true, the dd measurement Matthew did issued IO at a rate of one sector at a time and waiting for every sector to complete: dd if=/dev/nvme0n1 of=/dev/null iflag=direct bs=512 count=100 So my suggestions about batching and IRQ rate control are immaterial... That said, I dislike the patch intensely. I do not think it's at all a good idea to look at need_resched to say I can spin now. You're still wasting CPU cycles. So Willy, please do *not* mix this up with the scheduler, or at least not need_resched. Instead, maybe we should introduce a notion of if we are switching to the idle thread, let's see if we can try to do some IO synchronously. You could try to do that either *in* the idle thread (which would take the context switch overhead - maybe negating some of the advantages), or alternatively hook into the scheduler idle logic before actually doing the switch. But anything that starts polling when there are other runnable processes to be done sounds really debatable. Even if it's only 5us or so. There's a lot of real work that could be done in 5us. I'm wondering, how will this scheme work if the IO completion latency is a lot more than the 5 usecs in the testcase? What if it takes 20 usecs or 100 usecs or more? Will we still burn our CPU time, wasting power, inflating this CPU's load which keeps other CPUs from balancing tasks over to this CPU, etc? In the 5 usecs case it looks beneficial to do. In the longer-latency cases I'm not so sure. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
* Jens Axboe ax...@kernel.dk wrote: - With the former note, the app either needs to opt in (and hence willingly sacrifice CPU cycles of its scheduling slice) or it needs to be nicer in when it gives up and goes back to irq driven IO. The scheduler could look at sleep latency averages of the task in question - we measure that already in most cases. If the 'average sleep latency' is below a certain threshold, the scheduler, if it sees that the CPU is about to go idle, could delay doing the context switch and do light idle-polling, for say twice the length of the expected sleep latency - assuming the CPU is otherwise idle - before it really schedules away the task and the CPU goes idle. This would still require an IRQ and a wakeup to be taken, but would avoid the context switch. Yet I have an ungood feeling about depending on actual latency values so explicitly. There will have to be a cutoff value, and if a workload is just below or just above that threshold then behavior will change markedly. Such schemes rarely worked out nicely in the past. [Might still be worth trying it.] Couldn't the block device driver itself estimate the expected latency of IO completion and simply poll if that's expected to be very short [such as there's only a single outstanding IO to a RAM backed device]? IO drivers doing some polling and waiting in the microseconds range isnt overly controversial. I'd even do that if the CPU is busy otherwise: the task should see a proportional slowdown as load increases, with no change in IO queueing behavior. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
* David Ahern dsah...@gmail.com wrote: On 6/23/13 3:09 AM, Ingo Molnar wrote: If an IO driver is implemented properly then it will batch up requests for the controller, and gets IRQ-notified on a (sub-)batch of buffers completed. If there's any spinning done then it should be NAPI-alike polling: a single is stuff completed polling pass per new block of work submitted, to opportunistically interleave completion with submission work. I don't see where active spinning brings would improve performance compared to a NAPI-alike technique. Your numbers obviously show a speedup we'd like to have, I'm just wondering whether the same speedup (or even more) could be implemented via: - smart batching that rate-limits completion IRQs in essence + NAPI-alike polling ... which would almost never result in IRQ driven completion when we are close to CPU-bound and while not yet saturating the IO controller's capacity. The spinning approach you add has the disadvantage of actively wasting CPU time, which could be used to run other tasks. In general it's much better to make sure the completion IRQs are rate-limited and just schedule. This (combined with a metric ton of fine details) is what the networking code does in essence, and they have no trouble reaching very high throughput. Networking code has a similar proposal for low latency sockets using polling: https://lwn.net/Articles/540281/ In that case it might make sense to try the generic approach I suggested in the previous mail, which would measure average sleep latencies of tasks, and would do light idle-polling instead of the more expensive switch-to-the-idle-task context switch plus associated RCU, nohz, etc. busy-CPU-tear-down and the symmetric build-up work on idle wakeup. The IO driver would still have to take an IRQ though, preferably on the CPU that runs the IO task ... Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Allow block drivers to poll for I/O instead of sleeping
* Matthew Wilcox wi...@linux.intel.com wrote: A paper at FAST2012 (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) pointed out the performance overhead of taking interrupts for low-latency block I/Os. The solution the author investigated was to spin waiting for each I/O to complete. This is inefficient as Linux submits many I/Os which are not latency-sensitive, and even when we do submit latency-sensitive I/Os (eg swap-in), we frequently submit several I/Os before waiting. This RFC takes a different approach, only spinning when we would otherwise sleep. To implement this, I add an 'io_poll' function pointer to backing_dev_info. I include a sample implementation for the NVMe driver. Next, I add an io_wait() function which will call io_poll() if it is set. It falls back to calling io_schedule() if anything goes wrong with io_poll() or the task exceeds its timeslice. Finally, all that is left is to judiciously replace calls to io_schedule() with calls to io_wait(). I think I've covered the main contenders with sleep_on_page(), sleep_on_buffer() and the DIO path. I've measured the performance benefits of this with a Chatham NVMe prototype device and a simple # dd if=/dev/nvme0n1 of=/dev/null iflag=direct bs=512 count=100 The latency of each I/O reduces by about 2.5us (from around 8.0us to around 5.5us). This matches up quite well with the performance numbers shown in the FAST2012 paper (which used a similar device). Nice speedup! Is backing_dev_info the right place for this function pointer? Have I made any bad assumptions about the correct way to retrieve the backing_dev_info from (eg) a struct page, buffer_head or kiocb? Should I pass a 'state' into io_wait() instead of retrieving it from 'current'? Is kernel/sched/core.c the right place for io_wait()? Should it be bdi_wait() instead? Should it be marked with __sched? Where should I add documentation for the io_poll() function pointer? NB: The NVMe driver piece of this is for illustrative purposes only and should not be applied. I've rearranged the diff so that the interesting pieces appear first. -EMISSINGDIFFSTAT diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index c388155..97f8fde 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -68,6 +68,7 @@ struct backing_dev_info { unsigned int capabilities; /* Device capabilities */ congested_fn *congested_fn; /* Function pointer if device is md/dm */ void *congested_data; /* Pointer to aux data for congested func */ + int (*io_poll)(struct backing_dev_info *); char *name; @@ -126,6 +127,8 @@ int bdi_has_dirty_io(struct backing_dev_info *bdi); void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi); void bdi_lock_two(struct bdi_writeback *wb1, struct bdi_writeback *wb2); +void io_wait(struct backing_dev_info *bdi); + extern spinlock_t bdi_lock; extern struct list_head bdi_list; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 58453b8..4840065 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4527,6 +4527,36 @@ long __sched io_schedule_timeout(long timeout) return ret; } +/* + * Wait for an I/O to complete against this backing_dev_info. If the + * task exhausts its timeslice polling for completions, call io_schedule() + * anyway. If a signal comes pending, return so the task can handle it. + * If the io_poll returns an error, give up and call io_schedule(), but + * swallow the error. We may miss an I/O completion (eg if the interrupt + * handler gets to it first). Guard against this possibility by returning + * if we've been set back to TASK_RUNNING. + */ +void io_wait(struct backing_dev_info *bdi) +{ + if (bdi bdi-io_poll) { + long state = current-state; + while (!need_resched()) { + int ret = bdi-io_poll(bdi); + if ((ret 0) || signal_pending_state(state, current)) { + set_current_state(TASK_RUNNING); + return; + } + if (current-state == TASK_RUNNING) + return; + if (ret 0) + break; + cpu_relax(); + } + } + + io_schedule(); +} I'm wondering why this makes such a performance difference. If an IO driver is implemented properly then it will batch up requests for the controller, and gets IRQ-notified on a (sub-)batch of buffers completed. If there's any spinning done then it should be NAPI-alike polling: a single is stuff completed polling pass per new block of work submitted, to opportunistically interleave completion with submission work. I don't see where active spinning brings would improve performance compared to a NAPI-alike technique. Your numbers
Re: [git patches] libata updates
* Linus Torvalds torva...@linux-foundation.org wrote: I couldn't find an example of that in a quick look, it's fairly uncommon to have non-conflicting merges that had semantic - but not contextual - conflicts. [...] This: git log --grep='Semantic merge\|Semantic conflict' gives over a dozen examples of such semantic merges I've done in the past 4 years. I fully agree that they are best done in the merge commit - in hindsight perhaps with a tad more explanation than I did. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
* Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: so please tell me Jeff. If Greg, who is the super-maintainer of your code area, and who deals with your code every day and changes it every minute and hour, simply did not Cc: the SCSI list - how am i, a largely outside party in this matter, supposed to notice that 3 maintainers and 3 mailing lists in the Cc: were somehow not enough and that i was supposed to grow the already sizable Cc: list even more? Because, regardless of the situation, it's both common courtesy and wise practice to CC relevant driver maintainers, when you touch a driver. And it's just common sense: Greg simply does not know the intimate details of every PCI driver. Nor do I. Nor you. In the case of lpfc here, we have an active driver maintainer, and an up-to-date MAINTAINERS entry. Even if you are too slack to read MAINTAINERS, 'git log' would have given you the same info. Don't pretend there is some benefit here to ignoring the people that best know the driver. I don't buy that; it simply makes no engineering sense whatsoever. what you _STILL_ do not realize is the following: you still attribute the lack of Cc:s to some intention of mine. No, it was not my intention. At first glance the Cc: looked large and complete enough in an _existing_ discussion and that's was the end of my (brief) attention regarding the Cc: line. Yes, it would have been a bit better had i noticed the lack of Cc:s in an existing discussion, but i didnt. [ And it might not surprise you if i observe here that i think this little mishap is further (incidental) proof that having this many mailing list aliases to get the 'guaranteed attention' of maintainers is just super fragile and does not serve users at all. Even Greg and i got it wrong accidentally. If _we_ get it wrong, who will get it right? I see several mis-Cc:ed emails every day and there's over a 1000 unfixed bugs in bugzilla for 2.6 alone. It does not take a genius to observe that something is fundamentally wrong ;-) That 2.6 works on your or my box is a _very_ poor metric - we both fix bugs on our systems super fast so we've got a very biased first-hand experience about the stability and reliability of the kernel. We never really feel the kind of frustration that testers feel when their mail to lkml gets ignored. We never really feel the helplessness that comes from unfixed bugs. ] Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
* Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: === --- linux.orig/drivers/scsi/lpfc/lpfc_init.c +++ linux/drivers/scsi/lpfc/lpfc_init.c @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, uint16_t iotag; int bars = pci_select_bars(pdev, IORESOURCE_MEM); - if (pci_enable_device_bars(pdev, bars)) +if (pci_enable_device_io(pdev)) goto out; Look at the line right above it... AFAICS you want pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs. Also a CC to linux-scsi and the driver author would be nice, as they are the ones with hardware and can verify. it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) Instead i did a search of lkml (based on the function name in the build error) and figured out where the pull request was on lkml: Greg. I replied to that mail, he'll obviously know whom else to Cc from that point on (if anyone). I really didnt want to (nor did i need to) figure out whether this was some general driver level API change that happen kernel-wide, or some SCSI specific change. I simply replied to the pull request whose Cc: line seemed well-populated to me already. I also took a look at the commit itself and did a quick hack in a hurry to keep the tests rolling. It really did not occur to me that i should have added anyone else to the Cc: line, as [EMAIL PROTECTED] was Cc:-ed already so i assumed the interest was from that angle. ( And as this was spent from my family's weekend time and i had no time and no interest to dig any further than to figure out the first hop of the change that broke the build, and the parties who initiated that hop. I'm in fact surprised that your and James's answer to my bugreport is hostility. ) So i find your suggestion that i should have added more people to the Cc: line unfair on several levels. This set of changes seemed like 50% guesswork to me, without consulting the authors :( And unlike many changes, you actually have to know the hardware [or get clues from surrounding code] to make the change. you mean the whole set of changes? Or just mine? Mine was a 30 seconds guesswork of course, i dont have the hardware, i didnt do the change, nor did i do anything near that code - i just saw the build failure stop my testing and sent in this notification and a hack. That's why i sent it to Greg, as a reply to the mail where he pushed these changes upstream and who'll know what to do with it from that point on. My patch can be totally thrown away (and should be, apparently). but ... i guess next time i'll think twice before sending any bugreports about or related to the SCSI code anywhere, unless they become really annoying. Who needs this hassle? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
* Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) Oh come on... You are smart enough to know to at least CC the driver maintainer, the key POC who should be aware of breakage of their driver. That is a standard courtesy. is there any particular reason why you cut out the most relevant part of my reply, which happens to answer all your questions AFAICS: Instead i did a search of lkml (based on the function name in the build error) and figured out where the pull request was on lkml: Greg. I replied to that mail, he'll obviously know whom else to Cc from that point on (if anyone). I really didnt want to (nor did i need to) figure out whether this was some general driver level API change that happen kernel-wide, or some SCSI specific change. I simply replied to the pull request whose Cc: line seemed well-populated to me already. I also took a look at the commit itself and did a quick hack in a hurry to keep the tests rolling. It really did not occur to me that i should have added anyone else to the Cc: line, as [EMAIL PROTECTED] was Cc:-ed already so i assumed the interest was from that angle. had you read this portion you'd have realized that i did not search for any owner of the file, i simply searched for the person who introduced the change, and the on-lkml mail where the change was introduced. and that's all that should be needed, really. Believe me, i hit tons of bugs all across the kernel, often several bugs a day, and it's hard even for me to figure out who maintains a file and when. (and in Linux there's no ownership of files anyway) So as a general rule i go after changes instead, and that's exactly what i did here too. I do delta/regression QA - i.e. i watch for _changes_ that break the kernel and hence the general 'owner' of a file is often irrelevant - it's the maintainer who introduces a change who matters, and we do lots of cross-maintain merges. Only if i do not manage to identify a change do i try to figure out who maintains a file at that given moment. (But those mails often go into black holes, they get bounced, subscriber-required email lists, etc. etc.) It's also nontrivial to map the files to the MAINTAINERS file, and it's also quite outdated in some portions. So the MAINTAINERS file is the last resort i use. so i'm still totally befuddled why you think that there was anything particularly wrong or unhelpful about me replying to the specific pull request that introduced a particular breakage into the kernel. Had i mailed to lkml with a terse kernel build broke message with just an URL to a config and the build breakage, you could rightfully have complained that i should have done more to properly direct my bugreport. But this breakage was about a PCI API change, the pull request had a PCI mailing list Cc:-ed, why should i have thought that this needs the attention of any other parties? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Bob Tracy [EMAIL PROTECTED] wrote: Current state of the source tree is the 6f37ac... version, so I'll start backing out the above diffs in related groups and continue until I've got a working kernel. For lack of an obvious target, I'll start with the seemingly innocuous change to sysctl_check.c. I'll report back when I've got something. That was quick :-). Backing out the sysctl_check.c diff gives me a working kernel. Beats the [EMAIL PROTECTED] out of me how/why, though. Michael Cree: could you try backing out the diff below from your 2.6.24-rc3 tree and see if things are now working for you? Here's uname -a, just to confirm (maybe) I'm running on what I say works: Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 alpha Here's the diff I backed out (patch -R). It's short... diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c index 5a2f2b2..4abc6d2 100644 --- a/kernel/sysctl_check.c +++ b/kernel/sysctl_check.c @@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = { { NET_ROSE, rose, trans_net_rose_table }, { NET_IPV6, ipv6, trans_net_ipv6_table }, { NET_X25, x25, trans_net_x25_table }, - { NET_TR, tr, trans_net_tr_table }, + { NET_TR, token-ring, trans_net_tr_table }, { NET_DECNET, decnet, trans_net_decnet_table }, /* NET_ECONET not used */ { NET_SCTP, sctp, trans_net_sctp_table }, reverting this makes the kernel image shorter by 8 bytes - so perhaps some alignment issue somewhere? Or something gets overflown? Does any of this get actually used by your bootup? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Bob Tracy [EMAIL PROTECTED] wrote: I'm struggling to see how any of those could have broken block device mounting on alpha. Are you sure you bisected right? Based on what's in that commit, it *does* appear something went wrong with bisection. If the implicated commit is the next one in time sequence relative to # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap INLINE and name timeval_cmp better then the test of whether I bisected correctly is as simple as applying the commit and seeing if things break, because I'm running on the kernel corresponding to 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right now. Let me give that a try and I'll report back. Worst case, I'll have to start over and write off the past four days... generally it's easier to just go back in time and re-try the last known good and last-known bad commit IDs to establish that they are indeed correctly identified. if they are not then step back one more in the bisection log. No need to spend another 4 days on this, if most of the bisection is OK. You can replay a corrected git bisection log quickly, by doing: git-bisect reset git-bisect bisect.log Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Bob Tracy [EMAIL PROTECTED] wrote: Ingo Molnar wrote: * Bob Tracy [EMAIL PROTECTED] wrote: Current state of the source tree is the 6f37ac... version, so I'll start backing out the above diffs in related groups and continue until I've got a working kernel. For lack of an obvious target, I'll start with the seemingly innocuous change to sysctl_check.c. I'll report back when I've got something. That was quick :-). Backing out the sysctl_check.c diff gives me a working kernel. Beats the [EMAIL PROTECTED] out of me how/why, though. Michael Cree: could you try backing out the diff below from your 2.6.24-rc3 tree and see if things are now working for you? Here's uname -a, just to confirm (maybe) I'm running on what I say works: Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 alpha Here's the diff I backed out (patch -R). It's short... diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c index 5a2f2b2..4abc6d2 100644 --- a/kernel/sysctl_check.c +++ b/kernel/sysctl_check.c @@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = { { NET_ROSE, rose, trans_net_rose_table }, { NET_IPV6, ipv6, trans_net_ipv6_table }, { NET_X25, x25, trans_net_x25_table }, - { NET_TR, tr, trans_net_tr_table }, + { NET_TR, token-ring, trans_net_tr_table }, { NET_DECNET, decnet, trans_net_decnet_table }, /* NET_ECONET not used */ { NET_SCTP, sctp, trans_net_sctp_table }, reverting this makes the kernel image shorter by 8 bytes - so perhaps some alignment issue somewhere? Or something gets overflown? Does any of this get actually used by your bootup? Dunno... The dmesg output is not terribly useful here, because most of the interesting stuff concerning udev startup that appears on the console never makes it into a log. Note that, for the bad cases, I don't see the same console output that Michael reported, although the net effect is the same: the partitions don't get found, so I'm offered the chance to enter my root password and do some poking around, and when I do, none of the block devices are present under /dev. I'm open to suggestions on how to take this analysis further. Michael indicated he's running a conference this week, so I don't know when he'll be able to come up for air. i'm not sure how to do direct debugging on udev, so i can only guess about what effect on the kernel side could have caused this. One bad hack would be to probe udevd's behavior by changing the NET_TR entry in various ways: tr - token-ring # breaks tr - tr # works tr - token-rin0 # ?(1) tr - TR # ?(2) the question is, does tweak (1) and tweak (2) work or break? but it would be a lot more effective i guess to get some udevd expert's attention on this ... Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Andrew Morton [EMAIL PROTECTED] wrote: then the test of whether I bisected correctly is as simple as applying the commit and seeing if things break, because I'm running on the kernel corresponding to 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right now. Let me give that a try and I'll report back. Worst case, I'll have to start over and write off the past four days... Gad. I trust the second time will be faster. git-bisect _is_ very error prone. I find one of the problems is that each step is so far apart in time that you forget what you were doing. Did I remember to test that iteration? Did I install the right kernel? etc. i have a fully automated bootup-hang bisection script. It is based on git-bisect run. I run the script, it builds and boots kernels fully automatically, and when the bootup fails (the script notices that via the serial log, which it continuously watches - or via a timeout, if the system does not come up within 10 minutes it's a bad kernel), the script raises my attention via a beep and i power cycle the test box. (yeah, i should make use of a managed power outlet to 100% automate it) So i dont have to a single manual decision anytime during the bisection. But the scripts are very much tied to my ad-hoc test environment so it would not be of much general use. Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.23-rc3 can't see sd partitions on Alpha
* Bob Tracy [EMAIL PROTECTED] wrote: Finally got back in town. Starting the git-bisect process. I've got a relatively slow network connection, and the PWS 433au isn't exactly what I would call fast by modern standards, so bear with me while I get things set up and crank through this. The clone of the 2.6 tree will take several more hours to finish downloading. I anticipate the best pace I'll be able to manage after that is two iterations in a 24- hour period. once you are done with the download of the initial cloned git repository (which is 200MB+), all the bisection steps will be local and you'll be only limited by kernel rebuild speed and by bootup and testing speed, not by network bandwidth. ( once you have the cloned repository i'd suggest for you to keep it - that way you can track susequent kernels via git-pull and it uses a very network-efficient delta protocol. ) Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.24-rc3-git2 softlockup detected
* Kamalesh Babulal [EMAIL PROTECTED] wrote: So 2.6.24-rc3 was OK and 2.6.24-rc3-git2 is not? Yes, the 2.6.24-rc3 was Ok and this is seen from 2.6.24-rc3-git2/3/4. just to make sure: this is a real lockup and failed bootup (or device init), not just a message, right? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] 2.6.24-rc3-git2 softlockup detected
* Kamalesh Babulal [EMAIL PROTECTED] wrote: Hi Ingo, This softlockup is seen in the 2.6.24-rc4 either and looks like a message because this is seen while running tbench and machine continues running other test's after the softlockup messages and some times seen with the bootup, but the machines reaches the login prompt and able to continue running tests. do you know whether there's any true delay when this happens, or is it a pure softlockup-detector false positive? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/2] 2.6.23-rc2: known regressions with patches
* Michal Piotrowski [EMAIL PROTECTED] wrote: Memory management Subject : [bug] SLUB freeing locks References : http://lkml.org/lkml/2007/7/26/90 Last known good : ? Submitter : Ingo Molnar [EMAIL PROTECTED] Caused-By : ? Handled-By : Peter Zijlstra [EMAIL PROTECTED] Patch : http://lkml.org/lkml/2007/7/26/97 Status : patch available fixed by commit 2208b764c14d0f1ad63da64b1a42db6077b6fe42. Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove name length check in a workqueue
yeah ... cannot remember why i have done it originally :-| Acked-by: Ingo Molnar [EMAIL PROTECTED] Ingo On Wed, 10 Aug 2005, James Bottomley wrote: Ingo, This has been in the workqueue code in day one, for no real reason that I can see. We just tripped over it in SCSI because the fibre channel transport class creates one workqueue per host with the name scsi_wq_%d which trips this after we get to 100. Unfortunately we just came across someone with 100 host adapters ... I think the solution is just to get rid of the artificial limit. James diff --git a/kernel/workqueue.c b/kernel/workqueue.c --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -308,8 +308,6 @@ struct workqueue_struct *__create_workqu struct workqueue_struct *wq; struct task_struct *p; - BUG_ON(strlen(name) 10); - wq = kmalloc(sizeof(*wq), GFP_KERNEL); if (!wq) return NULL; - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html