qla2xxx BUG: workqueue leaked lock or atomic

2007-02-26 Thread Andre Noll
Hi On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems connected to a qla2xxx card and used as a single volume via lvm. The system seems to lock up only if data gets written to both raid systems at the same time. On a standard kernel nothing makes it to the log, the system just

Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-02-26 Thread Andrew Vasquez
On Mon, 26 Feb 2007, Andre Noll wrote: On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems connected to a qla2xxx card and used as a single volume via lvm. The system seems to lock up only if data gets written to both raid systems at the same time. On a standard kernel nothing

SCSI devices with 256-byte sectors don't work?

2007-02-26 Thread Chuck Ebbert
Apparently there really are such devices: Sep 28 20:05:42 localhost kernel: scsi4 : SCSI emulation for USB Mass Storage devices Sep 28 20:05:42 localhost kernel: Vendor: Sandisk Model: ImageMate SDDR09 Rev: 0100 Sep 28 20:05:42 localhost kernel: Type: Direct-AccessANSI SCSI

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Alan wrote: I think that this is mostly true, but we also need to balance this against the need for higher levels to get a timely response. In a really large IO, a naive retry of a very large write could lead to a non-responsive system for a very large time... And losing the I/O could

Re: end to end error recovery musings

2007-02-26 Thread H. Peter Anvin
Theodore Tso wrote: In any case, the reason why I bring this up is that it would be really nice if there was a way with a single laptop drive to be able to do snapshots and background fsck's without having to use initrd's with device mapper. This is a major part of why I've been trying to

Re: end to end error recovery musings

2007-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Theodore Tso wrote: Can someone with knowledge of current disk drive behavior confirm that for all drives that support bad block sparing, if an attempt to write to a particular spot on disk results in an error due to bad media at that spot, the disk drive will automatically

Re: end to end error recovery musings

2007-02-26 Thread Alan
One interesting counter example is a smaller write than a full page - say 512 bytes out of 4k. If we need to do a read-modify-write and it just so happens that 1 of the 7 sectors we need to read is flaky, will this look like a write failure? The current core kernel code can't handle