On Sun, 21 Aug 2005, David Relson wrote: > Greetings, > > I've got an external USB hard drive that is set up for system backups. > Here's the configuration: > > kernel: 2.6.11-12mdk (Mandriva 10.2) > USB Device: ID 04b4:6830 Cypress Semiconductor Corp. USB-2.0 IDE Adapter > Drive: Seagate ST320082 Model: 2A (200GB IDE) > FileSystem: ReiserFS > Software: BackupPC > > Backing up my small machine (approx 5GB) works fine. Backing up my big > machine (approx 60GB) generally fails, with device and file system > errors like: > > Aug 16 08:18:12 osage kernel: scsi: Device offlined - not ready after error > recovery: host 12 channel 0 id 0 lun 0 > Aug 16 08:18:12 osage kernel: SCSI error : <12 0 0 0> return code = 0x50000 > Aug 16 08:18:12 osage kernel: end_request: I/O error, dev sda, sector 24743185 > Aug 16 08:18:12 osage kernel: ReiserFS: sda1: warning: vs-13070: > reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of > [14 6127116 0x0 SD] > Aug 16 08:18:12 osage kernel: scsi12 (0:0): rejecting I/O to offline device > Aug 16 08:18:12 osage kernel: Buffer I/O error on device sda1, logical block > 7540335 > > Suspecting problems due to bad blocks, yesterday about 08:00 I started > running /sbin/badblocks to do a full read/write test, i.e. command: > > /sbin/badblocks -b 4096 -v -w /dev/usbhd > > This morning I found the job had hung. The "ps" command reports: > > F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND > 4 0 13849 1 18 0 2336 1160 dio_aw D ? 61:25 /sbin/badblocks -b > 4096 -v -w /dev/usbhd > > > In /var/log/syslog I found: > > Aug 21 05:12:35 osage kernel: usb 4-2: USB disconnect, address 19 > > Aug 21 05:13:04 osage kernel: usb 4-2: usb_sg_cancel, unlink --> -19 > > So, badblocks ran for approx 21 hrs before the USB disconnect. > > Looking through last week's /var/log/syslog, I also found 2 kernel BUG > reports from Aug 16 (at 08:18:29 and at 20:36:06). They're at the end > of this message. > > Initially I thought this a ReiserFS problem, but now I suspect it to be > USB, though I can't tell if it's hardware or software. > > Any suggestions of how to narrow down the problem in order to correct > it? > > Also, the badblocks task seems to be unkillable in its "D" state. 3 > weeks ago I encountered the same problem (unkillable in "D") with > BackupPC. Starting "usbview" results in yet another task in "D" > state. Any suggestions how to nuke these tasks (other than the obvious > -- reboot) ??? > > Thanks. > > David
You can try using a 2.6.13 kernel. The error-recovery procedure in usb-storage was changed, and it should be more robust. Once tasks are stuck in a "D" state, there isn't much you can do about it. It's possible that unplugging the USB cable (or turning off the disk drive) would free them up. More useful would be to post a stack dump (Alt-SysRq-T) showing exactly _where_ the processes are stuck. > ########## Kernel Bug - Aug 16 08:18:29 ########## > > Aug 16 08:18:29 osage kernel: kernel BUG at <bad filename>:44267! > Aug 16 08:18:29 osage kernel: invalid operand: 0000 [#1] > Aug 16 08:18:29 osage kernel: Modules linked in: reiserfs nls_iso8859-1 > nls_cp437 vfat fat tun deflate zlib_deflate twofish serpent aes-i586 blowfish > des sha256 sha1 md5 crypto_null xfrm_user ipcomp esp4 ah4 af_key af_packet > tulip ide-cd loop udf sd_mod usb-storage scsi_mod amd-k7-agp agpgart ehci-hcd > uhci-hcd usbcore ext3 jbd > Aug 16 08:18:29 osage kernel: CPU: 0 > Aug 16 08:18:29 osage kernel: EIP: 0060:[try_to_free_buffers+107/144] > Not tainted VLI > Aug 16 08:18:29 osage kernel: EIP: 0060:[<c01b71eb>] Not tainted VLI > Aug 16 08:18:29 osage kernel: EFLAGS: 00010246 (2.6.11-12mdk) > Aug 16 08:18:29 osage kernel: EIP is at try_to_free_buffers+0x6b/0x90 > Aug 16 08:18:29 osage kernel: eax: 00000000 ebx: c100f980 ecx: c740beb8 > edx: fffffffb > Aug 16 08:18:29 osage kernel: esi: 00000000 edi: 00000020 ebp: ce435e10 > esp: ce435dfc > Aug 16 08:18:29 osage kernel: ds: 007b es: 007b ss: 0068 > Aug 16 08:18:29 osage kernel: Process BackupPC_dump (pid: 23597, > threadinfo=ce434000 task=c6880a40) > Aug 16 08:18:29 osage kernel: Stack: 010bf27d 010bf285 00000000 c100f980 > 00000000 ce435e28 e0a3187e c100f980 > Aug 16 08:18:30 osage kernel: 0001f077 ce435ee0 fffffffb ce435f6c > e0a33138 ce435ee0 00000020 0be53f89 > Aug 16 08:18:30 osage kernel: 00000000 00000020 0001f077 ce435ee0 > fffffffb 00000001 ce435e78 c016e9d8 > Aug 16 08:18:30 osage kernel: Call Trace: > Aug 16 08:18:30 osage kernel: [show_stack+127/160] show_stack+0x7f/0xa0 > Aug 16 08:18:30 osage kernel: [<c0103c4f>] show_stack+0x7f/0xa0 > Aug 16 08:18:30 osage kernel: [show_registers+342/464] > show_registers+0x156/0x1d0 > Aug 16 08:18:30 osage kernel: [<c0103de6>] show_registers+0x156/0x1d0 > Aug 16 08:18:30 osage kernel: [die+200/336] die+0xc8/0x150 > Aug 16 08:18:30 osage kernel: [<c0103fe8>] die+0xc8/0x150 > Aug 16 08:18:30 osage kernel: [do_invalid_op+184/208] do_invalid_op+0xb8/0xd0 > Aug 16 08:18:30 osage kernel: [<c0104498>] do_invalid_op+0xb8/0xd0 > Aug 16 08:18:30 osage kernel: [error_code+43/48] error_code+0x2b/0x30 > Aug 16 08:18:30 osage kernel: [<c01038eb>] error_code+0x2b/0x30 > Aug 16 08:18:30 osage kernel: [pg0+542795902/1068971008] > reiserfs_unprepare_pages+0x2e/0x70 [reiserfs] > Aug 16 08:18:30 osage kernel: [<e0a3187e>] > reiserfs_unprepare_pages+0x2e/0x70 [reiserfs] > Aug 16 08:18:30 osage kernel: [pg0+542802232/1068971008] > reiserfs_file_write+0x748/0x770 [reiserfs] > Aug 16 08:18:30 osage kernel: [<e0a33138>] reiserfs_file_write+0x748/0x770 > [reiserfs] > Aug 16 08:18:30 osage kernel: [vfs_write+376/384] vfs_write+0x178/0x180 > Aug 16 08:18:30 osage kernel: [<c01b28c8>] vfs_write+0x178/0x180 > Aug 16 08:18:30 osage kernel: [sys_write+75/128] sys_write+0x4b/0x80 > Aug 16 08:18:30 osage kernel: [<c01b299b>] sys_write+0x4b/0x80 > Aug 16 08:18:30 osage kernel: [sysenter_past_esp+82/117] > sysenter_past_esp+0x52/0x75 > Aug 16 08:18:30 osage kernel: [<c0102e5d>] sysenter_past_esp+0x52/0x75 > Aug 16 08:18:30 osage kernel: Code: 85 c0 eb 10 8b 58 04 89 04 24 e8 91 01 00 > 00 3b 5d f4 89 d8 75 ee 89 f2 83 c4 0c 89 d0 5b 5e 5d c3 89 1c 24 e8 47 11 fe > ff eb d2 <0f> 0b eb ac 89 1c 24 8d 45 f4 89 44 24 04 e8 e2 fe ff ff 89 c6 > > ########## Kernel Bug - Aug 16 20:36:06 ########## > > Aug 16 20:36:06 osage kernel: kernel BUG at <bad filename>:44267! > Aug 16 20:36:06 osage kernel: invalid operand: 0000 [#2] > Aug 16 20:36:06 osage kernel: Modules linked in: reiserfs nls_iso8859-1 > nls_cp437 vfat fat tun deflate zlib_deflate twofish serpent aes-i586 blowfish > des sha256 sha1 md5 crypto_null xfrm_user ipcomp esp4 ah4 af_key af_packet > tulip ide-cd loop udf sd_mod usb-storage scsi_mod amd-k7-agp agpgart ehci-hcd > uhci-hcd usbcore ext3 jbd > Aug 16 20:36:06 osage kernel: CPU: 0 > Aug 16 20:36:06 osage kernel: EIP: 0060:[try_to_free_buffers+107/144] > Not tainted VLI > Aug 16 20:36:06 osage kernel: EIP: 0060:[<c01b71eb>] Not tainted VLI > Aug 16 20:36:06 osage kernel: EFLAGS: 00010246 (2.6.11-12mdk) > Aug 16 20:36:06 osage kernel: EIP is at try_to_free_buffers+0x6b/0x90 > Aug 16 20:36:06 osage kernel: eax: 20000020 ebx: c12e3340 ecx: cd93b438 > edx: fffffffb > Aug 16 20:36:06 osage kernel: esi: 00000000 edi: 0000001f ebp: d9f19e10 > esp: d9f19dfc > Aug 16 20:36:06 osage kernel: ds: 007b es: 007b ss: 0068 > Aug 16 20:36:06 osage kernel: Process BackupPC_dump (pid: 27627, > threadinfo=d9f18000 task=d76dca60) > Aug 16 20:36:06 osage kernel: Stack: 0000000f 00d58353 00000000 c12e3340 > 00000000 d9f19e28 e0a3187e c12e3340 > Aug 16 20:36:07 osage kernel: 0001e47c d9f19ee0 fffffffb d9f19f6c > e0a33138 d9f19ee0 0000001f 00314822 > Aug 16 20:36:08 osage kernel: 00000000 0000001f 0001e47c d9f19ee0 > fffffffb 00000000 d9f19e64 c8b06858 > Aug 16 20:36:08 osage kernel: Call Trace: > Aug 16 20:36:08 osage kernel: [show_stack+127/160] show_stack+0x7f/0xa0 > Aug 16 20:36:08 osage kernel: [<c0103c4f>] show_stack+0x7f/0xa0 > Aug 16 20:36:08 osage kernel: [show_registers+342/464] > show_registers+0x156/0x1d0 > Aug 16 20:36:08 osage kernel: [<c0103de6>] show_registers+0x156/0x1d0 > Aug 16 20:36:08 osage kernel: [die+200/336] die+0xc8/0x150 > Aug 16 20:36:08 osage kernel: [<c0103fe8>] die+0xc8/0x150 > Aug 16 20:36:08 osage kernel: [do_invalid_op+184/208] do_invalid_op+0xb8/0xd0 > Aug 16 20:36:08 osage kernel: [<c0104498>] do_invalid_op+0xb8/0xd0 > Aug 16 20:36:08 osage kernel: [error_code+43/48] error_code+0x2b/0x30 > Aug 16 20:36:08 osage kernel: [<c01038eb>] error_code+0x2b/0x30 > Aug 16 20:36:08 osage kernel: [pg0+542795902/1068971008] > reiserfs_unprepare_pages+0x2e/0x70 [reiserfs] > Aug 16 20:36:08 osage kernel: [<e0a3187e>] > reiserfs_unprepare_pages+0x2e/0x70 [reiserfs] > Aug 16 20:36:08 osage kernel: [pg0+542802232/1068971008] > reiserfs_file_write+0x748/0x770 [reiserfs] > Aug 16 20:36:08 osage kernel: [<e0a33138>] reiserfs_file_write+0x748/0x770 > [reiserfs] > Aug 16 20:36:08 osage kernel: [vfs_write+376/384] vfs_write+0x178/0x180 > Aug 16 20:36:08 osage kernel: [<c01b28c8>] vfs_write+0x178/0x180 > Aug 16 20:36:08 osage kernel: [sys_write+75/128] sys_write+0x4b/0x80 > Aug 16 20:36:08 osage kernel: [<c01b299b>] sys_write+0x4b/0x80 > Aug 16 20:36:08 osage kernel: [sysenter_past_esp+82/117] > sysenter_past_esp+0x52/0x75 > Aug 16 20:36:08 osage kernel: [<c0102e5d>] sysenter_past_esp+0x52/0x75 > Aug 16 20:36:08 osage kernel: Code: 85 c0 eb 10 8b 58 04 89 04 24 e8 91 01 00 > 00 3b 5d f4 89 d8 75 ee 89 f2 83 c4 0c 89 d0 5b 5e 5d c3 89 1c 24 e8 47 11 fe > ff eb d2 <0f> 0b eb ac 89 1c 24 8d 45 f4 89 44 24 04 e8 e2 fe ff ff 89 c6 These certainly look to me like bugs in the reiserfs code. Alan Stern ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users