On Wed, 09 Feb 2005, [EMAIL PROTECTED] wrote:
> > seems like sdev->shost is bogus when fc_remote_port_block() is
> > called...
>
> We haven't seen this in our testing....
>
Actually it's not the sdev->host that's bogus -- it appears the sdev
is referenced after it's been freed -- a reference still present in
the shost->__devices list. Here's the scenario:
* 1 lun connected to HBA1 -- sdev created for the lun via rport:
*** sdev=d76196e8 host=d36f0000 state=2 gdev=d761987c
* mid-layer performs linear scan of non-existent ID's (via
scsi_sysfs_target_initialize():
*** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=1
emp=0
...
*** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=509
emp=0
*** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=510
emp=0
*** adding sdev=dd2bc738 sdev->siblings=dd2bc740, __dev=d36f0000 id=511
emp=0
* remove lun from the fabric (port-side cable pull).
* driver recognizes loss via RSCN, issues fc_remote_port_block(),
starget_for_each_device() -> shost_for_each_device() ->
__scsi_iterate_devices() where scsi_device_get() is called for
reference.
1st sdev valid (ok):
*** ctr=0 sdev=d76196e8 host=d36f0000 state=2 gdev=d761987c id=0
*** sdev=d76196e8 host=d36f0000 state=2 gdev=d761987c
2nd sdev invalid -- note old sdev (dd2bc738) from previous linear
scan:
*** ctr=0 sdev=dd2bc738 host=6b6b6b6b state=1802201963 gdev=dd2bc8cc
id=1802201963
*** sdev=dd2bc738 host=6b6b6b6b state=1802201963 gdev=dd2bc8cc
[BLAH]
Unable to handle kernel paging request at virtual address 6b6b6be7
printing eip:
c028ef06
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: qla2322 qla2xxx
CPU: 0
EIP: 0060:[<c028ef06>] Not tainted VLI
EFLAGS: 00010086 (2.6.11-rport)
EIP is at scsi_device_get+0x56/0xa0
eax: 6b6b6b6b ebx: dd2bc738 ecx: c035f844 edx: fffffffa
esi: dd2bc8cc edi: d36f0000 ebp: 00000001 esp: df693dd4
ds: 007b es: 007b ss: 0068
Process qla2322_1_dpc (pid: 11316, threadinfo=df692000 task=d9fa8530)
Stack: c0341fcc dd2bc738 6b6b6b6b 6b6b6b6b dd2bc8cc dd2bc738 d76196f0
c028f011
c0341ff4 00000000 dd2bc738 6b6b6b6b 6b6b6b6b dd2bc8cc 6b6b6b6b
00000282
d76196e8 d76196e8 ddd7e790 d36f0000 c029af50 c028f0bd 00000000
dbe8512c
Cale Trace:
[<c028f011>] __scsi_iterate_devices+0x71/0xb0
[<c029af50>] fc_device_block+0x0/0x10
[<c028f0bd>] starget_for_each_device+0x6d/0x80
[<c029afff>] fc_remote_port_block+0x3f/0x70
[<e08633d3>] qla2x00_mark_device_lost+0x53/0xe0 [qla2xxx]
signature very consistent.
Another quirk when run with no storage connected to HBAs and the
driver is loaded, then unloaded -- is a consistent BUG() hit in
_raw_spin_lock() via scsi_forget_host():
kernel BUG at include/asm/spinlock.h:149!
invalid operand: 0000 [#1]
SMP
Modules linked in: qla2322 qla2xxx
CPU: 1
EIP: 0060:[<c030b373>] Not tainted VLI
EFLAGS: 00010096 (2.6.11-rport)
EIP is at _spin_lock_irqsave+0x53/0x60
eax: 0000000e ebx: 00000282 ecx: c035f80c edx: 00000082
esi: 6b6b6bab edi: d86f1ecc ebp: d348d530 esp: d86f1ea4
ds: 007b es: 007b ss: 0068
Process rmmod (pid: 11209, threadinfo=d86f0000 task=d348d530)
Stack: c031e548 c030960c 6b6b6ba3 6b6b6bab c030960c 00000000 d348d530
c0117610
00000000 00000000 0000006b d3920000 6b6b6b6b da0c3b74 d3920000
6b6b6b6b
d86f0000 c03097d3 d392002c 0000006b c0297656 6b6b6b63 d3920000
6b6b6b63
Call Trace:
[<c030960c>] __down+0x3c/0xe0
[<c030960c>] __down+0x3c/0xe0
[<c0117610>] default_wake_function+0x0/0x10
[<c03097d3>] __down_failed+0x7/0xc
[<c0297656>] .text.lock.scsi_sysfs+0x8/0x22
[<c0296061>] scsi_forget_host+0x31/0x60
[<c028f3e1>] scsi_remove_host+0x11/0x60
[<e08629df>] qla2x00_remove_one+0x1f/0x40 [qla2xxx]
[<c01f9108>] pci_device_remove+0x28/0x30
[<c024cc04>] device_release_driver+0x74/0x80
[<c024cc28>] driver_detach+0x18/0x30
[<c024d13c>] bus_remove_driver+0x5c/0xa0
[<c024d6c8>] driver_unregister+0x8/0x30
[<c01f933b>] pci_unregister_driver+0xb/0x20
[<c013279e>] sys_delete_module+0x16e/0x190
[<c014b61a>] unmap_vma_list+0x1a/0x30
[<c014b9c5>] do_munmap+0x115/0x160
[<c014ba5a>] sys_munmap+0x4a/0x70
[<c010308d>] sysenter_past_esp+0x52/0x75
Code: 90 80 3e 00 7e f9 fa eb e8 89 d8 8b 74 24 0c 8b 5c 24 08 83 c4 10
c3 c7 04 24 48 e5 31 c0 8b 44
host variable seems to be hosed. Perhaps I'm doing something wrong
during shutdown -- just the standard scsi_remove_host(), I also tried
to add the fc_remove_host() call (as per directed in comments) but
same results occured...
Andrew Vasquez
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html