There is a bug in Linux 2.0.36 scsi cdrom driver.
2.0.38 has the same bug and the suspicious code of 2.2.13 looks similar
so I think it has not been fixed yet.
This bug-report refers to 2.0.36
I have a data CD-R which my cdrom-drive can't read, the error correction
fails.
I can mount this CD-R because the inner tracks are o.k.
But when I read the files in certain directories, the kernel prints
a lot of error messages and sometimes a register/stack dump.
When this happens the system hangs, nothing works.
The error is quite reproduceable, it is not possible to read
the whole CD-R without a system crash.
I compiled my kernel with the -g -DDEBUG options to see what happens
here is the kernel output to the console:
... a lot of scsi error messages
command : 08 04 b8 96 0e 00 00 00 14 42
internal_cmnd (host = 0, channel = 0, target = 5, command = 00e84234,
buffer = 0001d000,
bufflen = 28627, done = 0481229c)
queuecommand : routine at 00188608)
leaving internal_cmnd()
Leaving scsi_do_cmd()
In scsi_done(host = 0, result = 000002)
Calling done function - at address 0481229c
sr.c done: 28000002 225a400
scsi_free 0001d200 1024
scsi_free 0001d000 512
(2281998 36 0) stack segment: 0000
CPU: 0
EIP: 0010:[<04812616>]
EFLAGS: 00010286
eax: 00000000 ebx: 00000000 ecx: 0000232c edx: 00000000
esi: 00e84214 edi: 00000000 ebp: c800130c esp: 001ab0c0
ds: 0018 es: 0018 fs: 002b gs: 0000 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=001a9254)
Stack: 00000a93 00e84214 00000000 00000028 0009712e 0009e2b8 0001d200
0000001c
0001d000 00000a93 00000004 00000000 00000038 28000002 0017810c
00000000
002b8024 002b8068 00000246 00000246 001ab17c 002b8068 00000000
002b8068
Call Trace: [<0017810c>] [<0017c4b6>] [<00183690>] [<0010cd45>]
[<0010cb53>] [<001623b6>] [<00109ab5>]
[<00109b23>] [<0010aae5>] [<0010971c>] [<001094d8>] [<00117df8>]
[<00111ff8>]
Code: 83 7c 15 04 00 74 12 8b 44 15 08 50 8b 44 15 00 50 e8 50 63
Aiee, killing interrupt handler
kfree of non-kmalloced memory: 001ab29c, next= 00000100, order=1749660
kfree of non-kmalloced memory: 001ab28c, next= 001ab29c, order=1749660
kfree of non-kmalloced memory: 001ab7a0, next= 001ab28c, order=1749660
idle task may not sleep
idle task may not sleep
idle task may not sleep
idle task may not sleep
idle task may not sleep
That's what ksymoops says about the dump:
Using `System.map' to map addresses to symbols.
Trace: 17810c <scsi_done+7e4/7f0>
Trace: 17c4b6 <aic7xxx_done_cmds_complete+3a/4c>
Trace: 183690 <do_aic7xxx_isr+5c/74>
Trace: 10cd45 <do_IRQ+65/88>
Trace: 10cb53 <IRQ15_interrupt+5f/84>
Trace: 1623b6 <apm_do_idle+42/80>
Trace: 109ab5 <hard_idle+29/5c>
Trace: 109b23 <sys_idle+3b/70>
Trace: 10aae5 <system_call+55/7c>
Trace: 10971c <init>
Trace: 1094d8 <start_kernel+1d4/1e0>
Trace: 117df8 <it_real_fn>
Trace: 111ff8 <schedule+234/28c>
Code:
Code: 83 7c 15 04 00 cmpl $0x0,0x4(%ebp,%edx,1)
Code: 74 12 je 19 <_EIP+0x19>
Code: 8b 44 15 08 movl 0x8(%ebp,%edx,1),%eax
Code: 50 pushl %eax
Code: 8b 44 15 00 movl 0x0(%ebp,%edx,1),%eax
Code: 50 pushl %eax
Code: e8 50 63 00 90 call 90006366 <_EIP+0x90006366>
Code: 90 nop
Code: 90 nop
The error happens in the sr_mod.o module
I disassembled it with objdump -S and found this section:
for(i=0; i<SCpnt->use_sg; i++) {
5ff: 31 db xorl %ebx,%ebx
601: 66 85 c0 testw %ax,%ax
604: 74 32 je 638 <rw_intr+0x3a0>
606: 89 f6 movl %esi,%esi
if (sgpnt[i].alt_address) {
608: 8d 04 5b leal (%ebx,%ebx,2),%eax
60b: 8d 14 85 00 00 leal 0x0(,%eax,4),%edx
610: 00 00
612: 83 7c 15 04 00 cmpl $0x0,0x4(%ebp,%edx,1)
617: 74 12 je 62b <rw_intr+0x393>
scsi_free(sgpnt[i].address, sgpnt[i].length);
619: 8b 44 15 08 movl 0x8(%ebp,%edx,1),%eax
61d: 50 pushl %eax
61e: 8b 44 15 00 movl 0x0(%ebp,%edx,1),%eax
622: 50 pushl %eax
623: e8 fc ff ff ff call 624 <rw_intr+0x38c>
}
628: 83 c4 08 addl $0x8,%esp
62b: 43 incl %ebx
62c: 8b 4c 24 3c movl 0x3c(%esp,1),%ecx
630: 0f b7 41 42 movzwl 0x42(%ecx),%eax
634: 39 c3 cmpl %eax,%ebx
636: 7c d0 jl 608 <rw_intr+0x370>
}
The kenel crashes in
if (sgpnt[i].alt_address) {
more precisely in
612: 83 7c 15 04 00 cmpl $0x0,0x4(%ebp,%edx,1)
This code can be found in drivers/scsi/sr.c, line 273, function rw_intr
The adresses in the register can be maped to the linenumbers in the
objdump
by EIP-04812004, because the kernel prints
Calling done function - at address 0481229c
and that is function
00000298 <rw_intr>:
static void rw_intr (Scsi_Cmnd * SCpnt)
{
298: 83 ec 28 subl $0x28,%esp
The difference between 0481229c and 298 is 04812004 so
EIP: 0010:[<04812616>] means line 612:
I think the problem is, that in drivers/scsi/sr.c, line 198, function
rw_intr
if (good_sectors > 0)
{ /* Some sectors were read successfully. */
the variable good_sectors is 2 and the kernel executes this part of code
where it calls
scsi_free(sgpnt[i].address, sgpnt[i].length);
scsi_free(SCpnt->buffer, SCpnt->sglist_len); /*
Free list of scatter-gather pointers */
and
good_sectors -= 2;
After that good_sectors is null because in line 245
printk("(%x %x %x) ",SCpnt->request.bh,
SCpnt->request.nr_sectors,
good_sectors);
prints (2281998 36 0) to the console.
Then the code after line 264
if (good_sectors == 0) {
/* We only come through here if no sectors were read
successfully. */
is also executed but the memory of sgpnt[i].alt_address in line 273
if (sgpnt[i].alt_address) {
has already been freed and so the kernel crashes.
I have already tried to change if (good_sectors == 0) to an else
but then the kernel crashes in sr.c at line 351
printk("SCSI CD error : host %d id %d lun %d return code =
%03x\n",
scsi_CDs[DEVICE_NR(SCpnt->request.rq_dev)].device->host->host_no,
scsi_CDs[DEVICE_NR(SCpnt->request.rq_dev)].device->id,
scsi_CDs[DEVICE_NR(SCpnt->request.rq_dev)].device->lun,
result);
when accessing host_no with a general protection: 0000
AMD K6 200 MHz processor
Asus TX97 Mainboard
Adaptec AHA-2940UW scsi controller
Pioneer Super12x scsi cdrom
(from /proc/pci):
Bus 0, device 10, function 0:
SCSI storage controller: Adaptec AIC-7881U (rev 0).
Medium devsel. Fast back-to-back capable. IRQ 15. Master
Capable. Latency=32. Min Gnt=8.Max Lat=8.
I/O at 0xd400.
Non-prefetchable 32 bit memory at 0xe4000000.
(from /proc/scsi/scsi):
Host: scsi0 Channel: 00 Id: 05 Lun: 00
Vendor: PIONEER Model: CD-ROM DR-U12X Rev: 1.06
Type: CD-ROM ANSI SCSI revision: 02
Please send me an e-mail if you need additional information.
Alexander
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]