On Wed, 13 Feb 2008 13:43:24 -0800
Tim Pepper <[EMAIL PROTECTED]> wrote:
> We recently upgraded a production x86_64 machine with serveraid
> cards to 2.6.24 and noted that /proc/scsi/scsi showed garbage for our
> serveraid service processors. sg_inq also returned garbage from the
> service processors' sg devices. After a few iterations I started seeing
> meaninful stuff in the garbage. Not sure if it was returning live memory
> or just unzero'd. Either way not good so we went back to a known good,
> older kernel and tried to repro on a similar machine. We got different,
> but still bad results in terms of pointing at memory badness.
>
> FWIW, the original machine had the following hardware:
> scsi0 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4H>
> scsi1 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4M>
> and the repro's have been on a machine with just:
> scsi0 : IBM PCI ServeRAID 7.12.05 Build 761 <ServeRAID 4Mx>
>
> On the repro machine I'm getting a hang on ips driver load with the following
> logged:
>
> Feb 13 13:16:08 ipstest kernel: [ 915.236563] scsi3 : IBM PCI ServeRAID
> 7.12.05 Build 761 <ServeRAID 4Mx>
> Feb 13 13:16:08 ipstest kernel: [ 915.236839] Unable to handle kernel NULL
> pointer dereference at 0000000000000000 RIP:
> Feb 13 13:16:08 ipstest kernel: [ 915.236863] [check_addr+16/80]
> check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [ 915.237209] PGD 79fff067 PUD 7a898067 PMD 0
> Feb 13 13:16:08 ipstest kernel: [ 915.237341] Oops: 0000 [1] SMP
> Feb 13 13:16:08 ipstest kernel: [ 915.237463] CPU 1
> Feb 13 13:16:08 ipstest kernel: [ 915.239436] Modules linked in: ips aic94xx
> Feb 13 13:16:08 ipstest kernel: [ 915.239559] Pid: 5213, comm: scsi_scan_3
> Not tainted 2.6.23-ips_as_module #3
> Feb 13 13:16:08 ipstest kernel: [ 915.239692] RIP: 0010:[check_addr+16/80]
> [check_addr+16/80] check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [ 915.239932] RSP: 0018:ffff810076d87900
> EFLAGS: 00010082
> Feb 13 13:16:08 ipstest kernel: [ 915.240059] RAX: 0000000000000000 RBX:
> ffff81007b636300 RCX: 0000000000000024
> Feb 13 13:16:08 ipstest kernel: [ 915.240196] RDX: 000000007b636b00 RSI:
> ffffffff8077cde0 RDI: ffffffff806c4ed5
> Feb 13 13:16:08 ipstest kernel: [ 915.240332] RBP: ffff810076d87900 R08:
> 0000000000000500 R09: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [ 915.240468] R10: ffff81007aa33b40 R11:
> 0000000000000060 R12: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [ 915.240605] R13: 0000000000000001 R14:
> ffffffff8077cde0 R15: ffff81007aa33a80
> Feb 13 13:16:08 ipstest kernel: [ 915.240741] FS: 0000000000000000(0000)
> GS:ffff810001039300(0000) knlGS:0000000000000000
> Feb 13 13:16:08 ipstest kernel: [ 915.240981] CS: 0010 DS: 0018 ES: 0018
> CR0: 000000008005003b
> Feb 13 13:16:08 ipstest kernel: [ 915.241111] CR2: 0000000000000000 CR3:
> 0000000078a98000 CR4: 00000000000006e0
> Feb 13 13:16:08 ipstest kernel: [ 915.241248] DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Feb 13 13:16:08 ipstest kernel: [ 915.241384] DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Feb 13 13:16:08 ipstest kernel: [ 915.241520] Process scsi_scan_3 (pid:
> 5213, threadinfo ffff810076d86000, task ffff81007be26720)
> Feb 13 13:16:08 ipstest kernel: [ 915.241761] Stack: ffff810076d87930
> ffffffff802125c3 ffff81007aa33a80 ffff81007480cf50
> Feb 13 13:16:08 ipstest kernel: [ 915.242006] 0000000000000000
> ffff81007ba38ca8 ffff810076d87940 ffffffff8046fb42
> Feb 13 13:16:08 ipstest kernel: [ 915.242248] ffff810076d879c0
> ffffffff8801c2ee ffff81007aa33af0 000000017aa33af0
> Feb 13 13:16:08 ipstest kernel: [ 915.242389] Call Trace:
> Feb 13 13:16:08 ipstest kernel: [ 915.242606] [nommu_map_sg+115/144]
> nommu_map_sg+0x73/0x90
> Feb 13 13:16:08 ipstest kernel: [ 915.242736] [scsi_dma_map+66/96]
> scsi_dma_map+0x42/0x60
> Feb 13 13:16:08 ipstest kernel: [ 915.242867] [_end+124884230/2127548952]
> :ips:ips_next+0x33e/0xc00
> Feb 13 13:16:08 ipstest kernel: [ 915.242986] [scsi_done+0/48]
> scsi_done+0x0/0x30
> Feb 13 13:16:08 ipstest kernel: [ 915.243114] [_end+124896894/2127548952]
> :ips:ips_queue+0x106/0x1f0
> Feb 13 13:16:08 ipstest kernel: [ 915.243240] [scsi_dispatch_cmd+498/784]
> scsi_dispatch_cmd+0x1f2/0x310
> Feb 13 13:16:08 ipstest kernel: [ 915.243370] [scsi_request_fn+491/976]
> scsi_request_fn+0x1eb/0x3d0
> Feb 13 13:16:08 ipstest kernel: [ 915.243500]
> [__generic_unplug_device+37/48] __generic_unplug_device+0x25/0x30
> Feb 13 13:16:08 ipstest kernel: [ 915.243630]
> [blk_execute_rq_nowait+99/176] blk_execute_rq_nowait+0x63/0xb0
> Feb 13 13:16:08 ipstest kernel: [ 915.243761] [blk_execute_rq+122/224]
> blk_execute_rq+0x7a/0xe0
> Feb 13 13:16:08 ipstest kernel: [ 915.243889] [scsi_execute+240/288]
> scsi_execute+0xf0/0x120
> Feb 13 13:16:08 ipstest kernel: [ 915.244016] [scsi_execute_req+134/240]
> scsi_execute_req+0x86/0xf0
> Feb 13 13:16:08 ipstest kernel: [ 915.244145]
> [scsi_probe_and_add_lun+594/3472] scsi_probe_and_add_lun+0x252/0xd90
> Feb 13 13:16:08 ipstest kernel: [ 915.244279] [sas_expander_match+27/160]
> sas_expander_match+0x1b/0xa0
> Feb 13 13:16:08 ipstest kernel: [ 915.244412] [get_device+23/32]
> get_device+0x17/0x20
> Feb 13 13:16:08 ipstest kernel: [ 915.244534] [__scsi_scan_target+220/1696]
> __scsi_scan_target+0xdc/0x6a0
> Feb 13 13:16:08 ipstest kernel: [ 915.244665] [enqueue_entity+172/432]
> enqueue_entity+0xac/0x1b0
> Feb 13 13:16:08 ipstest kernel: [ 915.244793] [update_curr_load+135/160]
> update_curr_load+0x87/0xa0
> Feb 13 13:16:08 ipstest kernel: [ 915.244923]
> [__check_preempt_curr_fair+107/128] __check_preempt_curr_fair+0x6b/0x80
> Feb 13 13:16:08 ipstest kernel: [ 915.245057] [update_curr+258/272]
> update_curr+0x102/0x110
> Feb 13 13:16:08 ipstest kernel: [ 915.245186] [scsi_scan_channel+139/160]
> scsi_scan_channel+0x8b/0xa0
> Feb 13 13:16:08 ipstest kernel: [ 915.245315]
> [scsi_scan_host_selected+158/352] scsi_scan_host_selected+0x9e/0x160
> Feb 13 13:16:08 ipstest kernel: [ 915.245447] [do_scan_async+0/320]
> do_scan_async+0x0/0x140
> Feb 13 13:16:08 ipstest kernel: [ 915.245574] [do_scsi_scan_host+126/128]
> do_scsi_scan_host+0x7e/0x80
> Feb 13 13:16:08 ipstest kernel: [ 915.245703] [do_scan_async+23/320]
> do_scan_async+0x17/0x140
> Feb 13 13:16:08 ipstest kernel: [ 915.245832] [do_scan_async+0/320]
> do_scan_async+0x0/0x140
> Feb 13 13:16:08 ipstest kernel: [ 915.245962] [kthread+77/128]
> kthread+0x4d/0x80
> Feb 13 13:16:08 ipstest kernel: [ 915.246086] [child_rip+10/18]
> child_rip+0xa/0x12
> Feb 13 13:16:08 ipstest kernel: [ 915.246209] [kthread+0/128]
> kthread+0x0/0x80
> Feb 13 13:16:08 ipstest kernel: [ 915.246333] [child_rip+0/18]
> child_rip+0x0/0x12
> Feb 13 13:16:08 ipstest kernel: [ 915.246457]
> Feb 13 13:16:08 ipstest kernel: [ 915.246564]
> Feb 13 13:16:08 ipstest kernel: [ 915.246565] Code: 4c 8b 00 48 8d 04 0a 4c
> 39 c0 76 2b b8 fe ff ff ff 31 f6 49
> Feb 13 13:16:08 ipstest kernel: [ 915.246933] RIP [check_addr+16/80]
> check_addr+0x10/0x50
> Feb 13 13:16:08 ipstest kernel: [ 915.247062] RSP <ffff810076d87900>
> Feb 13 13:16:08 ipstest kernel: [ 915.247181] CR2: 0000000000000000
>
> I was able to narrow it down in as much as with this reverted the machine
> seems to run fine:
> commit 2f4cf91cc0a1f32f75e1fa0a4d70a9bc7340a302
> [SCSI] ips: convert to use the data buffer accessors
>
> Nothing looks overly suspicious in that patch per se, although based
> on the list archives it looks like related changes caused other drivers
> grief. I've tried a variety of things to get a little more debug info,
> but to no avail. If anybody has any suggestions, I'd appreciate them!
Really sorry about the bug.
I have a slight doubt on the breakup code though I'm not sure you hit
the code. Reverting only the breakup part works? The patch is against
2.6.24.
diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 5c5a9b2..acabb19 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -3251,34 +3251,52 @@ ips_done(ips_ha_t * ha, ips_scb_t * scb)
* the rest of the data and continue.
*/
if ((scb->breakup) || (scb->sg_break)) {
- struct scatterlist *sg;
- int i, sg_dma_index, ips_sg_index = 0;
-
/* we had a data breakup */
scb->data_len = 0;
- sg = scsi_sglist(scb->scsi_cmd);
-
- /* Spin forward to last dma chunk */
- sg_dma_index = scb->breakup;
- for (i = 0; i < scb->breakup; i++)
- sg = sg_next(sg);
-
- /* Take care of possible partial on last chunk */
- ips_fill_scb_sg_single(ha,
- sg_dma_address(sg),
- scb, ips_sg_index++,
- sg_dma_len(sg));
-
- for (; sg_dma_index < scsi_sg_count(scb->scsi_cmd);
- sg_dma_index++, sg = sg_next(sg)) {
- if (ips_fill_scb_sg_single
- (ha,
- sg_dma_address(sg),
- scb, ips_sg_index++,
- sg_dma_len(sg)) < 0)
- break;
- }
+ if (scb->sg_count) {
+ /* S/G request */
+ struct scatterlist *sg;
+ int ips_sg_index = 0;
+ int sg_dma_index;
+
+ sg = scb->scsi_cmd->request_buffer;
+
+ /* Spin forward to last dma chunk */
+ sg_dma_index = scb->breakup;
+
+ /* Take care of possible partial on last chunk
*/
+ ips_fill_scb_sg_single(ha,
+ sg_dma_address(&sg
+
[sg_dma_index]),
+ scb, ips_sg_index++,
+ sg_dma_len(&sg
+
[sg_dma_index]));
+
+ for (; sg_dma_index < scb->sg_count;
+ sg_dma_index++) {
+ if (ips_fill_scb_sg_single
+ (ha,
+ sg_dma_address(&sg[sg_dma_index]),
+ scb, ips_sg_index++,
+ sg_dma_len(&sg[sg_dma_index])) < 0)
+ break;
+
+ }
+
+ } else {
+ /* Non S/G Request */
+ (void) ips_fill_scb_sg_single(ha,
+ scb->
+ data_busaddr +
+ (scb->sg_break *
+ ha->max_xfer),
+ scb, 0,
+ scb->scsi_cmd->
+ request_bufflen -
+ (scb->sg_break *
+ ha->max_xfer));
+ }
scb->dcdb.transfer_length = scb->data_len;
scb->dcdb.cmd_attribute |=
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html