On Friday 20 November 2015, Finn Thain wrote:
> 
> On Thu, 19 Nov 2015, Ondrej Zary wrote:
> 
> > On Thursday 19 November 2015 03:24:56 Finn Thain wrote:
> >
> > > On Wed, 18 Nov 2015, Ondrej Zary wrote:
> > >
> > > >
> > > > I have some NCR5380 ISA cards and can test them.
> > >
> > > Thanks Ondrej. I've no idea which ISA drivers are presently working in 
> > > mainline. Finding regressions may be more difficult than usual ;-)
> > 
> > You're right... looks very broken:
> > 
> > [   62.577194] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
> > n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, 
> > sg_tablesize 128, this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, USLEEP_POLL 
> > 3, USLEEP_WAITLONG 1250, options { AUTOPROBE_IRQ PSEUDO_DMA 
> > NCR53C400 }
> > [   62.796635] scsi 2:0:0:0: Direct-Access     IBM      0663             e  
> >   PQ: 0 ANSI: 2
> > [   63.878494] sd 2:0:0:0: Attached scsi generic sg1 type 0
> > [   95.848260] sd 2:0:0:0: aborting command
> > 
> > And the system hangs completely.
> > 
> 
> Yes. That was the usual failure mode. The old EH abort routine is fatal. 
> Up until I disabled PDMA by default for mac_scsi (in v3.19), that driver 
> would do the same thing.
> 
> > It's much better with your patches, but still not great :)
> > 
> 
> Pleased to hear it :)
> 
> > [   93.963264] pnp 01:01.00: [io  0x0240-0x025f]
> > [   93.963493] pnp 01:01.00: [irq 5]
> > [   93.965768] pnp 01:01.00: activated
> > [   93.977147] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
> > n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, 
> > sg_tablesize 128, this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, options { 
> > AUTOPROBE_IRQ PSEUDO_DMA }
> > [   93.987527] scsi host2: rejecting message
> > [   93.987647] Synchronous Data Transfer Request period = 100 ns offset = 12
> > [   94.001219] scsi 2:0:0:0: Direct-Access     IBM      0663             e  
> >   PQ: 0 ANSI: 2
> > [  113.000794] sd 2:0:0:0: Attached scsi generic sg1 type 0
> 
> I'd be interested to know what commands were in play in that 19 second 
> interval. Might need to use scsi_logging_level to figure that out.
> 
> My tests involved 3 different scsi targets (two disks and a CD-ROM) but 
> none of these send a SDTR. Your log says the driver correctly rejected the 
> SDTR message but that doesn't mean the target actually went to MSG IN 
> phase and got the message. Do you have any older targets you can test?

Yes, I have some older disks too and also CD-ROMs. This one was just handy in
an external enclosure (the card has only an external DB25 connector). It can
be opened easily so I'll test the other devices too.

> > [  144.852432] sd 2:0:0:0: [sdb] Unit Not Ready
> > [  144.852574] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current]
> > [  144.852713] sd 2:0:0:0: [sdb] Add. Sense: Select or reselect failure
> 
> AFAIK, the target should not have to abort any commands. Moreover, the 
> target should never experience a select/reselect failure, because you have 
> irq == 0 (see above) and that implies that the target is never permitted 
> the disconnect privilege.
> 
> > [  240.108292] INFO: task modprobe:1957 blocked for more than 120 seconds.
> > [  240.108418]       Not tainted 4.3.0-rc1+ #74
> 
> Why not use v4.3?

I had that already built so just quickly applied the patches and tested. I have
to update the git tree anyway as ACPI is broken.

> > [  240.108501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [  240.108597] modprobe        D 0000001a     0  1957   1950 0x00000000
> > [  240.108790]  ce0fad00 00000086 53881781 0000001a c1525f88 4edbe39c 
> > 0000001a 04ac33e5
> > [  240.109246]  00000000 ccd54000 ffffffff ffffffff d204b280 c139c504 
> > 00000000 c104416d
> > [  240.109699]  00000000 ce0fad00 c1054a45 c151fd8c c151fd8c d204b280 
> > 00000000 ccd6d100
> > [  240.110156] Call Trace:
> > [  240.110295]  [<c139c504>] ? schedule+0x5b/0x67
> > [  240.110430]  [<c104416d>] ? async_synchronize_cookie_domain+0x73/0x9f
> > [  240.110569]  [<c1054a45>] ? abort_exclusive_wait+0x6e/0x6e
> > [  240.110699]  [<c10ac9bc>] ? do_init_module+0xa4/0x1a3
> > [  240.110824]  [<c107ddb5>] ? load_module+0x14de/0x18ca
> > [  240.110948]  [<c107e2a0>] ? SyS_finit_module+0x47/0x56
> > [  240.111068]  [<c139e2c0>] ? sysenter_do_call+0x12/0x12
> 
> Not sure what module was being probed here. I presume it was g_NCR5380 or 
> g_NCR5380_mmio. Neither of these calls 'scsi_scan_host'. I'm not sure what 
> the implications are (?)

It was g_NCR5380 (DTCT-436P card).

> > [  240.852458] sd 2:0:0:0: [sdb] Read Capacity(10) failed: Result: 
> > hostbyte=DID_TIME_OUT driverbyte=DRIVER_SENSE
> > [  240.852620] sd 2:0:0:0: [sdb] Sense Key : Aborted Command [current]
> > [  240.852760] sd 2:0:0:0: [sdb] Add. Sense: Select or reselect failure
> > [  272.852471] sd 2:0:0:0: [sdb] Write Protect is off
> > [  272.852614] sd 2:0:0:0: [sdb] Mode Sense: 00 00 00 00
> > [  304.084452] sd 2:0:0:0: [sdb] Asking for cache data failed
> > [  304.084592] sd 2:0:0:0: [sdb] Assuming drive cache: write through
> 
> This looks like nonsense to me ... I don't think the target actually 
> aborted the reselection phase of a read capacity command. I'm out of ideas 
> here. Can anyone else make sense of this?


-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to