RE: [t13] UDma < Pio for byte count negotiation?

Mcgrath, Jim Fri, 25 Jan 2002 17:27:48 -0800

This message is from the T13 list server.



Pat,

I still think you're still a bit confused on how devices actually work:

    What I think I'm seeing is that AtapiPio (like parallel Scsi, 
    printer port Scsi, and UsbMass) give us a protocol for the host 
    and the device to negotiate an agreed count of bytes to move which way.

    In AtapiPio, this protocol is the sum of Cylinder (aka ByteCount)
    values given together with DRQ INTRQ, any time C/D I/O is consistently
    the expected x00 DataOut or x02 DataIn.  AtapiDma gives us no such
    protocol, not yet.

This is simply not true.  The host and device NEVER negotiate the number of
bytes to be transferred per command, no matter what the protocol used is.
The Host always DEMANDS, via the command bytes, the exact number of bytes
the device must receive on writes (data out), and LIMITS the maximum number
of bytes the device can transfer on reads (data in).  Even here, the host
DEMANDS the exact number of bytes read for transferring user data - it is
only with device specific control inofrmation that the device is allowed to
transfer fewer bytes than the host allocated space for.

The pace at which the data is transferred during a command is a different
matter - and is more complicated.  On reads the device must have the data
buffered before it can send it to the host.  The host in turn must have the
resources to receive the data.  On writes the opposite is true - the host is
sending, and the device is receiving.  The pacing of these transfers is done
via a flow control mechanism.  Note the the total number of bytes
transferred for the command was already decided at the command level.

In both ATA and SCSI it is the device that drives the pacing.  In SCSI the
device does this via sending REQs to the host, which are either matched in
real time with data (for reading) or give the host permission later on to
send to the device a corresponding number of bytes, clocked by the ACK line.
8 bit and 16 bit SCSI behave in identical ways.  There are never any
counters for the host to read (just signal transitions on a wire called
REQ).

In ATA the PIO protocol is very different.  The device is once again in
charge of pacing the transfers, but does so in groups of words (e.g. blocks)
rather than single words, as with SCSI.  This is done via the INTRQ signal.
Once detected by the host, the PIO protocol requires the host to drive
transfer every word in that block, regardless of whether it is a read of
write.  The device supplies a value (size of this block) to the host to
allow it to perform this function for that specific block for ATAPI - for
ATA the block size is fixed, and so is not supplied for every block.  Note
that once the block transfer has begun, there is no way for the device to
provide any pacing information to the host - the transfers by the host are
"blind."

You may have been thinking of this with respect to 8 bit SCSI.  There is a
"blind" IO transfer mode for 8 bit SCSI similar to this (i.e. no pairing of
REQs and ACKs) that Apple implemented many years ago.  It is not however in
the SCSI standard, and in general is not supported in the SCSI world.
Otherwise ATA PIO and SCSI transfers have always differed in that SCSI gives
the parties flow control at the word level, while PIO just gives it at the
block level.

For single word and multiple word DMA the host is still required to transfer
the individual words, but there are no values provided to the host via
registers.  Instead, the host can transfer the words until the device breaks
out of the DMA burst by deasserting DMARQ - this is how the device indicates
to the host that it cannot continue that particular DMA burst.  When the
device is ready to continue it can requests a new DMA burst by asserting
DMARQ.

For Ultra DMA the same protocol is used - no registers for the host to read,
just the use of DMARQ by the device to terminate a burst.  One key
difference is that the device drives the sending of read data, not the host.
This allows for faster read performance.  Another is that the host sends a
CRC value to the device at the end of every burst.  Otherwise it is the same
as with the older DMA modes.

Note that in all transfer modes the host can pause data transfer for its own
purposes.  In PIO and the single word and multiple word DMA it always drives
the transfer (read and write), as with UDMA for writes.  It just pauses the
driving of the transfer.  For UDMA writes the host can terminate the burst
(using the STOP signal) just as the device can (using DMARQ).

So each hardware protocol accomplishes the pacing of the data transfers in a
slightly different way.  None of this has anything to do with what anyone's
software does, since the software/hardware interface is just not covered by
the standards.  Not only is it platform dependent, but often chip set vendor
specific.  So if there are issues at those levels, it's important for people
to identify the products and/or API they are using.

As an example, I'm sure that some hardware requires software to read the
device registers during a command using ATAPI PIO to get a value to program
the host hardware.  I'm also sure that some hardware can do this on their
own, and make the entire process invisible to software.  None of that is
SCSI, ATA, ATAPI, etc... - it is the host hardware/software interface.  For
DMA operations no host hardware requires this sort of programming at all
(i.e. there is no concept of using INTRQ to pace data flow with DMA).  There
are certainly other counters to program (i.e. how much data the host can
accept/wants to send), but nothing that is an exact match to the counter
used in ATAPI PIO.  



Jim




-----Original Message-----
From: Pat LaVarre [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 25, 2002 10:35 AM
To: [EMAIL PROTECTED]
Subject: [t13] UDma < Pio for byte count negotiation?


This message is from the T13 list server.


> Subject: ... Question: ATAPI (CDROM) sector size for DMA
> "Mcgrath, Jim" <[EMAIL PROTECTED]> 01/24/02 06:03PM
> Rather than go down this particular path again in more detail,
> I'd ask Mark if his question was answered
> (and if not what additional information he specifically needs to know).

I like this plan.  Accordingly, I've changed the subject line for this
email, so we can hope for the old thread grow our understanding of the
problem Mark is trying to solve.

I did here BC Mark, but I did not CC Mark, in the hope of encouraging people
here to address replies either only to [EMAIL PROTECTED], or else only to
[EMAIL PROTECTED], in accord with Usenet netiquette.

> [EMAIL PROTECTED] 01/25/02 08:43AM
> Pat, I really don't want to get started on this again but 

In this thread, per Hale's explicit encouragement, I'm going to try again to
focus on how plainly AtapiPio can or cannot count data bytes more precisely
than AtapiDma can.  I'm hoping we can all resist the urge to go lose
ourselves in a new discussion over how many or few of us should care about
this relative deficiency.

> there are some "read" commands
> that can transfer a variable amount of data.
> Just use PIO for these commands.
...
> Where's the beef? (Sorry I just couldn't help myself.)

Consensus?

Yes indeed, Udma != Pio at least some of the time.  Yes indeed, even for
Cdb's well standardised before both the host and the device shipped.

Very good, this is a start.

> In both PIO mode and DMA mode
> the host must be prepared to receive
> [however much] data [the device requests to move]
> otherwise you have a "hung device" problem
> that requires a reset to fix.

I think we agree here.  (Hurrah!)

I'd say that a host that chooses instead to move the unexpected data risks
livelock: this device may ask to move data forever.  And I'd say that a host
that chooses to move data in an unexpected direction risks more than a host
that just gives up and resets.

> Pat, I still don't understand
> where/what the problem(s) is(are)
> with ATAPI DMA operations.

I appreciate - indeed I am amazed by - your continued patient interest.

What I think I'm seeing is that AtapiPio (like parallel Scsi, printer port
Scsi, and UsbMass) give us a protocol for the host and the device to
negotiate an agreed count of bytes to move which way.

In AtapiPio, this protocol is the sum of Cylinder (aka ByteCount) values
given together with DRQ INTRQ, any time C/D I/O is consistently the expected
x00 DataOut or x02 DataIn.  AtapiDma gives us no such protocol, not yet.

With AtapiPio, as with legacy asynchronous 8 bit Scsi, the device requests a
count of data bytes to move, and the host moves that much or less.

Aye, with a 16-bit data bus, the _device_ can't know exactly how many bytes
moved: the host half of a bus trace to move x81 bytes matches the host half
of a bus trace to move x82 bytes.  In both, we see x82 bytes of data clock
across the bus.

But with AtapiPio, still the _host_ can know if the count of bytes that
moved was the precisely agreed count, precisely something less, or if the
device asked to move more.  In both traces, Ansi tells us where the last
byte is.  But in the trace of moving x81 bytes, the odd x1F5:1F4 Cylinder
(aka ByteCount) in the bus trace tells us that last byte we saw clock across
the bus was a pad byte.

So far so good?

Now, with AtapiDma of any kind, vs. AtapiPio, we lose the lo bit of x1F4
CylinderLo (aka ByteCountLo).  With AtapiDma, all of the bus trace of a move
of x81 bytes matches the bus trace of a move of x82 bytes: not only the host
half, now also the device half.

Last time we got lost in a discussion of how often the bus trace of the
accompanying command out did or did not match.  I'd like to stay focused
here on counting data bytes, if we may.  ISO Network people would tell us
I'm asking to focus on the network layer that agrees specifically which
bytes moved which way, rather than the layer that discusses what they mean.

With just this much agreed, we can say AtapiPio counts bytes moving either
way better than AtapiDma does, whenever the total count is odd.

Alternatively, we could say AtapiPio knows how let the host and device agree
to move arbitrary counts of bytes, but AtapiDma doesn't know how to let the
host and device agree to move an odd count of bytes.

So far so good?

Now also with AtapiDma, as burst rate increases, the receiver of data clocks
loses more and more control over when the data clocks stop.

For the specific case of UDma33 (UDma Mode 2), as many as 5 bytes may clock
across the bus past when the receiver said please stop.

All the same, moving Atapi UDma data In is no worse than AtapiDma in
general.  Yea, so maybe the AtapiDma trace shows a few more bytes clocked
across the bus than the AtapiPio trace, but still any _host_ that bothers to
look can know if the device agreed to move within one of as many bytes as
expected, or less, or more.

But moving Atapi UDma data Out is worse than AtapiDma in general.  With
Atapi UDma Out, the host is as blind as the device was with Atapi UDma in.

Aye, the host and the device can agree perfectly how bytes clocked across
the bus.  Indeed, the host and the device can agree practically perfectly
how many bytes were included in the UDma Crc.  But did the device willingly
request the last few pairs of those bytes?  Noone can say.

If we say AtapiPio knows how let the host and device agree to move arbitrary
counts of bytes, but we can say Atapi UDma Out doesn't know how to let the
host and device agree to move counts of blocks unless the block sizes are
somehow agreed out of band to be larger than the indeterminacy designed into
any given burst rate.

The Cdb x 3B 02 00:00:00:00 0 01:FD 0 is a good short concrete example of
indeterminacy in UDma byte counts out.  By the quasi-public standards
commonly understood to apply, this Cdb means a WriteBuffer to move precisely
x1FD bytes out to address 00:00:00 in buffer 00.

Aye, a host that knows, in negotiation, to offer to move no more than x1FE
bytes can know whether the device chose to move within one of x1FE bytes
out, or something less.  But if less, the host can't know precisely how much
less.

And now consider a host that, in negotiation, offers to move x400 bytes out.
The trace can look the same, no matter whether the device actually wanted to
stop at x1FD, x1FE, x1FF, x200, x201, x202 ....

Me, I've been simulating the negotiation of byte count transferred for the
cases where the host and the device have not agreed perfectly out of band in
advance.

There I see that the count of bytes clocked across the bus actually does
vary.  For the example here, in AtapiPio, the count is always x1FE.  With
AtapiDma, the count jitters.  The largest counts seen are x1FD + X * 2 + 1,
where X is the pause indeterminacy designed into AtapiDma i.e. 2 for UDma33,
and larger above that.

If my simulations are wrong, then what have I misunderstood?

If my simulations are correct, then can we obsolete a new xA1
IdentifyPacketDevice bit?

This new bit would let devices advertise that they offer precise byte count
negotiation, by way of reporting specifically the residue of bytes that
clocked across the bus without having been willingly requested.

I figure to report this residue we can most naturally use the 16 bits of the
heretofore unused x1F5:1F4 Cylinder ports at x03 StatusIn time.

Already I've got devices programmed to do this, but as a host currently I'm
negotiating out of band to discover precisely which devices can.  I'd prefer
a standard bit, of course.

Thanks again in advance.    Pat LaVarre

Subscribe/Unsubscribe instructions can be found at www.t13.org.
Subscribe/Unsubscribe instructions can be found at www.t13.org.

RE: [t13] UDma < Pio for byte count negotiation?

Reply via email to