This message is from the T13 list server.
Pat,
This might be a root cause of the misunderstanding here (from your email):
The API between a PC and a 16-bit Ide Dma engine is a request to clock
across at most N * 2 bytes either in or out.
All other things being equal, with UDma, a 16-bit Ide Dma engine will
sometimes clock more "word"s across the bus than with SwDma.
I assume you meant Multi word DMA, not SwDma (i.e. I don't know what SwDma
is). In any event, this is not a correct statement at the command level,
which is the only level that is relevant. A quick look at DATA IN
operations:
In a "traditional" PC with an ATA host chip connected to a device, if the
host indicates via the command parameters (passed in the registers with ATA;
in the packet with ATAPI) that it wants at most N * 2 bytes transferred,
then the device will NEVER clock out more than N * 2 bytes (i.e. N clocks)
to the host in UDMA mode for DATA IN. It is also true for native ATA
commands or encapsulated SCSI commands transferred via the PACKET command.
The host software will allocate buffer memory (enough for all of the data in
the command, i.e. N * 2, but see PS) and set up the DMA channel for the
COMMAND before the command starts. After the command is issued to the
device the host and device hardware take over. Note that the host is
actually a bridge - typically a PCI to ATA bridge (even motherboard chipsets
will often model this internally as a ATA to PCI). The host has to manage
DMA burst both on the 32 bit wide PCI bus and the 16 bit wide ATA bus. It
has to handle the various word alignment issues, and the fact that competing
traffic (especially on the PCI bus) could easily break up simple
relationships between the PCI and UDMA bursts (e.g. the data for a single
UDMA burst could end up spread over multiple PCI bus bursts, etc...). The
host PCI DMA controller handles any details like scatter gather in the host
memory, as previously programmed by the host software.
As Hale pointed out in an earlier email, there are never any "extra bytes"
on the ATA bus. The device can never send more bytes than the host
indicated in the command. At the UDMA burst level there are never any
"extra" bytes because there is no meaning to the concept of the host "having
a number of bytes in mind for the DMA burst." The only thing important for
the host in this case is making sure it does not drop bytes due to a buffer
overrun, which is why it can PAUSE or STOP a UDMA burst. The ATA standard
tells the host the maximum number of bytes that it will still have to
receive after asserting PAUSE, so it can make sure it never gets a buffer
overrun.
In general the conceptual issue here may be equating a PIO data block
transfer with a DMA burst - these simply should not be equated. DMA bursts
are an even number of bytes, and sum up to no more than the bytes in the
command parameters per command, but otherwise are not restricted at all. A
sequence of bursts may be 18 bytes followed by 288 bytes followed by 1562
bytes followed by 4 bytes, etc... While you may see a more regular pattern
in many test situations, you cannot design assuming such a pattern. PIO and
DMA are equivalent only at the command level (they each transfer the same
number of words per command).
On DATA OUT, you indicated the following to be important:
I mean now to focus on the case of UDma Data Out when the receiver
chooses to move
less than the max permitted by the command, especially when this happens
without
the receiver reporting an ERR.
So what? In SCSI this is perfectly legal (see the SCSI Architecture Model,
SAM-2). You cannot transfer MORE bytes than indicated in the command, but
you can always transfer fewer bytes (this actually applies to both DATA IN
and DATA OUT). If this is an error, then it is up to the device and the
host to detect that from information other than the number of bytes
transferred (indeed, SAM explicitly tells designers not to make assumptions
about command termination solely from the number of bytes transferred).
In practice most commands using DATA OUT that I know of do require that the
number of bytes transferred and the transfer count in the command to be the
same for successful termination, but this is a command specific issue. If
you are not cracking the command information (and indeed, most PCI to ATA
host controllers do not crack the command), then you must rely on the host
and the device to make the determination of whether the command has
successfully completed or not (since they do have command specific
knowledge).
Jim
PS a host with small amounts of buffer memory can set up a physical buffer
smaller than the command transfer size at the start of the command, but it
is then responsible to appropriately stop the data transfers and reuse or re
allocate more buffer space as the command progresses in order to make sure
that all data is transferred successfully. Since this usually requires CPU
intervention, commands are generally small, and memory is inexpensive, it is
not common practice on PCs today. It use to be common on some SCSI adapters
which had n board memory or addressing limitations (64Kbytes use to be a
popular limitation). Today small adapter memories may be used to buffer the
transfer to large host resident buffers, but these memories in practice
perform more of the role of a traditional speed matching FIFO than a IO
buffer.
-----Original Message-----
From: Pat LaVarre [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 12, 2001 9:05 AM
To: [EMAIL PROTECTED]
Subject: [t13] to UDma from SwDma - for a 16-bit Ide Dma engine?
This message is from the T13 list server.
> "Mcgrath, Jim" <[EMAIL PROTECTED]> 12/11/01 06:56PM
I find your replies consistently helpful: may I ask for yet one more?
> you keep on insisting ... resisting ...
I heard these but left them to the end, on the theory that the apparent
miscommunication here is an illusion.
> receiver ... clocking sender's data ... abnormal
Yes, very, kudos to UDma folk for fixing this.
> clocking ... sender ... [implies] no need
> ... for the receiver to have any preknowledge
> of the number of bytes to be transferred
I likewise see no need, unless we accept the common desire of host folks to
limit in advance how much receiver memory the sender may write.
> ... Pio ... UDma ...
> The bridge ... just as well for either protocol
Do we invite less confusing digressions if we speak in terms of the 16-bit
Ide Dma engines of broader concern to many here rather than in terms of the
bridges to Atapi well known to me?
Can we agree ...
The API between a PC and a 16-bit Ide Dma engine is a request to clock
across at most N * 2 bytes either in or out.
All other things being equal, with UDma, a 16-bit Ide Dma engine will
sometimes clock more "word"s across the bus than with SwDma.
This will happen whenever the device stops data out prematurely, unless the
Dma engine had coincidentally, independently, chosen to delay for a
turnaround time just when the device decided to stop data out.
These coincidences that discourage extra "word"s may happen most often at
mutually agreed block boundaries.
> you keep on insisting
> that the receiver "requests" bytes
> during a data transfer from a sender.
Sorry, I think my terminology comes from Scsi, where the REQ line is the REQ
line even when it is the ACK line that is clocking data out.
I mean to say the receiver's only advance indication of how many bytes to
move which way is its interpretation of the Scsi command block, but the
command block per se never specifies more than the max count of bytes to
move.
I mean now to focus on the case of UDma Data Out when the receiver chooses
to move less than the max permitted by the command, especially when this
hapens without the receiver reporting an ERR.
> you keep on referring to ATAPI/PIO as the model,
> where actually [counting arbitrary bytes in Ide]
> is a clever hack.
I think I see now standard Atapi Pio can count requested bytes accurately
only by the accident that there the device requests via BSY:DRQ = 0:1 a
count of bytes to move rather than a count of 16-bit words to move.
Sorry I was ever clueless enough to think different.
> The API between a PC and a 16-bit Ide Dma engine
> is a request to clock across
> at most N * 2 bytes either in or out.
This is the only API I've ever seen in Ide source code for Wintel drivers.
I'm told this is the standard API in Windows - both the '9X/ME and 2K/XP
flavours. I know this is the API between a Usb host and a generic UsbMass
device, that API being designed to ape the Pio legacy of Wintel Atapi.
In all these cases, the API specifies the content of the Scsi command block
completely independent of the max count of bytes to move which way.
I'm hoping that this is accordingly the API of a typical 16-bit Dma engine
on a Wintel PC motherboard.
How wrong am I?
Thanks again in advance. Pat LaVarre
Subscribe/Unsubscribe instructions can be found at www.t13.org.
Subscribe/Unsubscribe instructions can be found at www.t13.org.