Re: [Freedos-devel] mTCP FTP performance (Was: watcom tcp

Rugxulo Thu, 07 Jul 2011 05:56:57 -0700

Hi,
    (yes, I tried trimming the quote, but it almost all seems vaguely
relevant, ugh)


At risk of embarrassing myself (again) by showing my ignorance ...

Mike, you said this:  "Disk becomes the major bottleneck on the faster
machines."

Doesn't FTP know the filesize of file-to-be-downloaded ahead of time?
If so, then perhaps you can try Eric's idea:

* create/open, seek to end-1 (in the empty output file), write a
single byte, close, reopen

This apparently avoids having to update the FAT over and over again
redundantly. See the following thread for some examples (esp. nidud's
comment about Doszip:  "This reduced the compression time from 455 to
180 sec.").

http://www.bttr-software.de/forum/board_entry.php?id=8862#p8879

I may be seriously off-base here (no surprise), but I felt I should
mention it "just in case"!


On 7/7/11, Michael B. Brutman <mbbrut...@brutman.com> wrote:
> On 7/6/2011 7:10 PM, Bernd Blaauw wrote:
>> Op 7-7-2011 1:32, Michael B. Brutman schreef:
>>> mTCP FTP compares poorly to the native stack and FireFox there, but FTP
>>> is working in a very limited environment:
>>>
>>>    * The TCP/IP socket receive buffer is tiny compared to the native
>>>      network stack
>>>    * You are doing filesystem writes 8KB at a time
>>>    * You have a small virtualization penalty
>>>    * The packet interface wasn't designed for high performance; every
>>>      incoming packet requires a hardware interrupt and two software
>>>      interrupts
>>
>> 8KB filesystem writes? odd. So it's:
>> 1) download/transfer 8KB (8KB transfer buffer)
>> 2) halt download, dump transfer buffer to disk and clear it
>> 3) continue downloading.
>
> Not so odd.  All comm code fills buffers and then processes the
> buffers.  Unless you have a multi-core system you are always halting the
> processing of TCP/IP protocol handling to do your disk writes.  Modern
> OSes with DMA support hide some of that by letting the DMA controller of
> the disk (and possibly the Ethernet controller if so equipped) do the
> byte copying work.  But in the absence of DMA the host CPU does
> everything, and does it in a single threaded manner.
>
>> Easier at least compared to having a 8KB transfer buffer plus a 'huge'
>> receive buffer (nearly size of all of machine's conventional memory, a
>> multiple of 8KB?) followed by only clearing the buffer if it's full or a
>> file has been downloaded completely (whichever comes first). Your single
>> buffer might be more efficient compared to transfer buffer plus receive
>> buffer.
>>
> In this environment we are entirely single threaded, except for the
> hardware buffering that happens on the Ethernet card.  To receive a
> packet the path looks like this:
>
> - the card receives and buffers the frame from the wire
> - the card signals a hardware interrupt
> - the packet driver responds and either interrogates the card or copies
> the contents of the frame
> - the packet driver makes an upcall to the TCP/IP code
> - the TCP/IP code either provides a buffer or says 'no room'
> - the packet driver makes a second upcall to let the TCP/IP code know
> the frame is copied
> - the interrupt ends and the interrupted code resumes
> - the packet must now go through IP and TCP protocol processing
>
> The buffering scheme works at three levels:
>
> - Raw packet buffers (20 at 1514 bytes)
> - TCP receive buffering (8KB)
> - File read/write chunk size (8KB)
>
> Raw packet buffers are used by the packet driver directly.  They are the
> critical resource; if you run out of those you start dropping frames
> coming in from the wire.  TCP buffering is designed to pull data from
> those packet buffers as quickly as possible so that they may be
> recycled.  (In the case where you have a lot of small incoming packets
> that is really critical because every incoming packet is allocated 1514
> bytes whether it needs it or not.)  The TCP buffer is organized as a
> ring buffer so it is more space efficient.
>
> The application reads from the TCP buffer and writes to the filesystem.
> All of this is still single threaded and for most systems the bottle
> neck is the disk access time, not the copying of data from multiple
> buffers.  At a minimum all reads and writes to the filesystem should be
> done in multiples of 512 bytes; anything less requires DOS to do a
> read/modify/write as it writes data to the blocks of the filesystem.
> 1KB reads and writes were very inefficient to do - after some
> experimenting I found that 8KB was good.  16 or 32KB are marginally better.
>
> The buffer sizes generally are not larger because larger does not make
> that much of a difference in the performance of the filesystem writes
> and does have a negative impact on buffering.  Long writes delay TCP
> protocol processing, causing incoming buffers to run low and delaying
> the sending of ACK packets for the received data.  The major opportunity
> for improving performance is to send the ACK for the packet as quickly
> as possible, right after TCP goes through protocol processing but before
> the application tries to read the receive buffer and empty it.  Getting
> that ACK out early keeps the flow of data constant and hides some of the
> latency the 'receive data/write data' cycle that is occurring.
>
> Another optimization I could make to this would be to have the FTP
> application handle the raw packet buffers directly.  TCP would continue
> to do protocol processing, but instead of copying the data to a TCP
> receive buffer it would just give FTP the raw packets and let FTP do the
> copying.  This was the original design and the first netcat code used
> this technique.  The technique removes some memcpy overhead, but the
> largest overhead comes from the filesystem write.  It made the end
> application code (FTP) more complex and error prone, and had a nasty
> habit of starving the packet driver for buffers if the disk write was
> too large.
>
> Most of my testing is on lower-end machines, like a 386-40 and the
> various 8088 machines that I have.  The performance of memcpy is far
> better on the newer processors due to pipeline efficiency and levels of
> caching.  (Even the 386-40 has a 128K L2 cache.)  Disk becomes the major
> bottleneck on the faster machines.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Re: [Freedos-devel] mTCP FTP performance (Was: watcom tcp

Reply via email to