If James Bottomley, Mike Anderson, or anyone else from linux-scsi can comment on the material below, I would appreciate it.
On Mon, 10 Jan 2005, Srihari Vijayaraghavan wrote: > I am unable to trigger the bug, despite my sincere efforts. If nothing else, I have to admire your persistence in plugging and unplugging the drive 120 times over the course of a half-hour! > Please refer to the attachment for the complete kernel log. > > I think I understand the problem (conceptually I guess): > 1. Connect the USB DVD-RW drive > 2. USB Storage says: "usb-storage: device found at nnn" and "usb-storage: > waiting for device to settle before scanning" > 3. And then it waits for 5+ seconds > 4. Then it initialises the peripheral and then says: "Vendor: PIONEER > Model: > DVD-RW DVR-107D Rev: 1.13 ..." and "usb-storage: device scan complete" That's right. But there's more going on behind the scenes that you aren't aware of. Hotplug programs, haldaemon, automounters, and who knows what else. > I think if I unplug the device precisely the same moment when it initialises > (step 4), and by time that routine completes kernel is really unhappy (D > state process, Badness in scsi_device_set_state at > drivers/scsi/scsi_lib.c:1717, and/or oops) to discover that the drive is not > there anymore :-). It's not that simple. Your log contained two different sorts of errors. The first occurred right near the start: Jan 10 20:24:46 desktop kernel: usb 1-3: USB disconnect, address 3 Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5147]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff388 error 4 Jan 10 20:24:46 desktop kernel: 50-fstab-sync.h[5161]: segfault at 0000000000000000 rip 000000379706f6b2 rsp 0000007fbffff6d8 error 4 This looks like a problem in a user program, not in the kernel. I have no idea what 50-fstab-sync.h is. Part of udev perhaps? Greg, do you know? The second problem occurred during a period in which you replugged the drive three times! Here's the relevant part of the log: Jan 10 20:48:37 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 111 Jan 10 20:48:37 desktop kernel: scsi111 : SCSI emulation for USB Mass Storage devices Jan 10 20:48:37 desktop kernel: usb-storage: device found at 111 Jan 10 20:48:37 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:48:42 desktop kernel: Vendor: PIONEER Model: DVD-RW DVR-107D Rev: 1.13 Jan 10 20:48:42 desktop kernel: Type: CD-ROM ANSI SCSI revision: 00 Jan 10 20:48:42 desktop kernel: sr1: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray Jan 10 20:48:42 desktop kernel: Attached scsi CD-ROM sr1 at scsi111, channel 0, id 0, lun 0 Jan 10 20:48:42 desktop kernel: usb-storage: device scan complete Jan 10 20:48:42 desktop kernel: usb 1-3: USB disconnect, address 111 That looks okay. But you don't see the hotplug etc. programs that are still running, keeping the device below scsi111 open. Jan 10 20:48:56 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 112 Jan 10 20:48:56 desktop kernel: scsi112 : SCSI emulation for USB Mass Storage devices Jan 10 20:48:56 desktop kernel: usb-storage: device found at 112 Jan 10 20:48:56 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:49:01 desktop kernel: usb-storage: device scan complete Jan 10 20:49:01 desktop kernel: usb 1-3: USB disconnect, address 112 That's normal also. Host scsi112 was created and apparently destroyed. Jan 10 20:49:08 desktop kernel: usb 1-3: new high speed USB device using ehci_hcd and address 113 Jan 10 20:49:08 desktop kernel: scsi113 : SCSI emulation for USB Mass Storage devices Jan 10 20:49:08 desktop kernel: usb-storage: device found at 113 Jan 10 20:49:08 desktop kernel: usb-storage: waiting for device to settle before scanning Jan 10 20:49:12 desktop kernel: In bus_reset, srb: 0000010037d71640 Jan 10 20:49:12 desktop kernel: device: 0000010038e3d000 Jan 10 20:49:12 desktop kernel: host: 000001002b500c00 Jan 10 20:49:12 desktop kernel: hostdata: 000001002b549c00 This is the first indication of something funny. I can't think how a bus reset would get initiated by the kernel -- that only happens after a timeout. Maybe it came from a user program? Jan 10 20:49:12 desktop kernel: ehci_hcd 0000:00:10.4: port 3 reset error -110 Jan 10 20:49:12 desktop kernel: hub 1-0:1.0: hub_port_status failed (err = -32) Jan 10 20:49:13 desktop kernel: usb 1-3: reset high speed USB device using ehci_hcd and address 113 Jan 10 20:49:13 desktop kernel: scsi: Device offlined - not ready after error recovery: host 111 channel 0 id 0 lun 0 Jan 10 20:49:13 desktop kernel: 111:0:0:0: Illegal state transition deleted->offline Jan 10 20:49:13 desktop kernel: Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1717 Jan 10 20:49:13 desktop kernel: Jan 10 20:49:13 desktop kernel: Call Trace:<ffffffffa0006808>{:scsi_mod:scsi_device_set_state+264} Jan 10 20:49:13 desktop kernel: <ffffffffa000481c>{:scsi_mod:scsi_error_handler+2732} Jan 10 20:49:13 desktop kernel: <ffffffff8010ebf3>{child_rip+8} <ffffffffa0003d70>{:scsi_mod:scsi_error_handler+0} Jan 10 20:49:13 desktop kernel: <ffffffff8010ebeb>{child_rip+0} And here's the weird part. Apparently the bus reset was not for the current device attached to scsi113, but for the old incarnation attached to scsi111 -- which is long since gone! That accounts for the illegal state transition. But why did it happen like this in the first place? Could it be caused by a user program doing an SG_SCSI_RESET ioctl while still holding the device open after scsi_remove_host has finished? Jan 10 20:49:13 desktop kernel: usb 1-3: USB disconnect, address 113 Jan 10 20:49:18 desktop kernel: In bus_reset, srb: 0000010037d71240 Jan 10 20:49:18 desktop kernel: device: 0000010038f09000 Jan 10 20:49:18 desktop kernel: host: 000001002b500400 Jan 10 20:49:18 desktop kernel: hostdata: 000001002b549c00 Jan 10 20:49:18 desktop kernel: scsi: Device offlined - not ready after error recovery: host 113 channel 0 id 0 lun 0 Jan 10 20:49:18 desktop kernel: usb-storage: device scan complete Now it's not surprising that the reset, which was intended for scsi111, has interfered with the proper operation of scsi113. Alan Stern ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ linux-usb-devel@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel