On Sat, 18 Feb 2017, Ondrej Zary wrote:

> On Friday 17 February 2017 23:38:12 Finn Thain wrote:
> > On Thu, 16 Feb 2017, Ondrej Zary wrote:
> > > On Tuesday 31 January 2017 02:31:45 Finn Thain wrote:
> > > [...]
> > >
> > > > Are you trying to figure out which commands are going to 
> > > > disconnect during a transfer? This is really a function of the 
> > > > firmware in the target; there are no good heuristics AFAICT, so 
> > > > the PDMA algorithm has to be robust. mac_scsi has to cope with 
> > > > this too.
> > > >
> > > > Does the problem go away when you assign no IRQ? When 
> > > > instance->irq == NO_IRQ, the core driver will inhibit disconnect 
> > > > privileges.
> > >
> > > Yes, it seems to run fine with IRQ/disconnect disabled. With IRQ 
> > > enabled, "dd if=/dev/sr0 of=anything" stops after a while.
> >
> > When you say "stops", do you mean an infinite loop in 
> > generic_NCR5380_pread or does the loop complete (which would 
> > presumably leave the target stuck in DATA IN phase, and a scsi command 
> > timeout would probably follow after 30 seconds...)
> 
> I've added timeouts to the possibly-infinite loops. It times out waiting 
> for the host buffer to become ready.

In mac_scsi you'll find that the PDMA loop exploits the 15ms poll_politely 
time limit to give the target device time to catch up. You might want to 
do something similar.

> Then everything breaks. Now I found why: pread() returns without waiting 
> for the 53C80 registers to be ready.

Ouch! You can't return control to the core driver when the 53C80 core is 
unavailable. That would need special handling: the core driver would have 
to fail the scsi command and reset the device (and bus), based on the 
result you return from NCR5380_dma_recv_setup/NCR5380_dma_send_setup.

> Adding the wait allows to continue in PIO mode with "tag#0 switching to 
> slow handshake".
> 

I don't think this is the code path you want. The target isn't really 
broken. But yes, we could use PIO as a slow workaround for fragile PDMA 
logic.

> > > I get gated 53C80 IRQ, BASR=0x10, MODE=0x0e, STATUS=0x7c.
> >
> > You can use NCR5380_print() to get a decoded register dump.
> >
> > When I decode the above values I get,
> >
> > BASR   = 0x10 = BASR_IRQ
> > MODE   = 0x0e = MR_ENABLE_EOP_INTR | MR_MONITOR_BSY | MR_DMA_MODE
> > STATUS = 0x7c = SR_BSY | SR_REQ | SR_MSG | SR_CD | SR_IO
> >
> > Since BASR_PHASE_MATCH is not set, the interrupt is almost certainly a 
> > phase mismatch. The new phase is SR_MSG | SR_CD | SR_IO = PHASE_MSGIN, 
> > which shows that the target has given up on the DATA IN phase and is 
> > presumably trying to send a DISCONNECT message.
> 
> Looks you're right. The transfersize is 4096, i.e. 2 CD-ROM sectors. 
> When the 2nd sector is not ready in the drive internal buffer, the drive 
> probably disconnects which breaks the fragile pdma transfer. When the 
> transfersize is limited to 2048 bytes, the problem goes away.
> 

OK.

> The problem also went away with enabled interrupts because I had some 
> debug printks added which were slowing down the transfer enough for the 
> drive (SONY CDU-55S) to keep up and not disconnect. Got the same result 
> by inserting udelay(100) after each 128 byte block trasnferred in 
> pread().
> 

Well, yeah, if you change the timing and omit to mention that, as well as 
changing the spinlock_irq_save/restore pairs, then it's going to be 
difficult for me to make sense of your results. Anyway, I'm glad that you 
got to the bottom of this mystery.

> > > When I enable interrupts during PDMA (like the removed UNSAFE macro 
> > > did), the problems go away. I see an IRQ after each pread call.
> >

-- 

Reply via email to