Re: [Freedos-devel] mTCP FTP performance (Was: watcom tcp

Michael B. Brutman Thu, 07 Jul 2011 04:59:15 -0700

On 7/6/2011 7:10 PM, Bernd Blaauw wrote:
> Op 7-7-2011 1:32, Michael B. Brutman schreef:
>> mTCP FTP compares poorly to the native stack and FireFox there, but FTP
>> is working in a very limited environment:
>>
>>    * The TCP/IP socket receive buffer is tiny compared to the native
>>      network stack
>>    * You are doing filesystem writes 8KB at a time
>>    * You have a small virtualization penalty
>>    * The packet interface wasn't designed for high performance; every
>>      incoming packet requires a hardware interrupt and two software
>>      interrupts
> I'm happy with whatever I can get. My real hardware has an Nvidia
> chipset network driver for which no packet drivers exist, so sticking to
> virtual machines. I wonder if any PCI (or even PCI-express or onboard)
> network cards still support packet drivers.
>
> 8KB filesystem writes? odd. So it's:
> 1) download/transfer 8KB (8KB transfer buffer)
> 2) halt download, dump transfer buffer to disk and clear it
> 3) continue downloading.


Not so odd.  All comm code fills buffers and then processes the 
buffers.  Unless you have a multi-core system you are always halting the 
processing of TCP/IP protocol handling to do your disk writes.  Modern 
OSes with DMA support hide some of that by letting the DMA controller of 
the disk (and possibly the Ethernet controller if so equipped) do the 
byte copying work.  But in the absence of DMA the host CPU does 
everything, and does it in a single threaded manner.

> Easier at least compared to having a 8KB transfer buffer plus a 'huge'
> receive buffer (nearly size of all of machine's conventional memory, a
> multiple of 8KB?) followed by only clearing the buffer if it's full or a
> file has been downloaded completely (whichever comes first). Your single
> buffer might be more efficient compared to transfer buffer plus receive
> buffer.
>
> Or perhaps I should stay silent hehe, failed miserably while learning
> about OSI layers and TCP/IP.

I suspect that the Intel chipsets on PCI-X cards will work; their PCI 
chipset cards had working packet drivers and from a software standpoint 
PCI-X is identical to PCI.  On the wiki at the Google project page I 
have three different PCI cards listed that are known to work.  (And I'd 
like to hear about more.)

In this environment we are entirely single threaded, except for the 
hardware buffering that happens on the Ethernet card.  To receive a 
packet the path looks like this:

- the card receives and buffers the frame from the wire
- the card signals a hardware interrupt
- the packet driver responds and either interrogates the card or copies 
the contents of the frame
- the packet driver makes an upcall to the TCP/IP code
- the TCP/IP code either provides a buffer or says 'no room'
- the packet driver makes a second upcall to let the TCP/IP code know 
the frame is copied
- the interrupt ends and the interrupted code resumes
- the packet must now go through IP and TCP protocol processing

The buffering scheme works at three levels:

- Raw packet buffers (20 at 1514 bytes)
- TCP receive buffering (8KB)
- File read/write chunk size (8KB)

Raw packet buffers are used by the packet driver directly.  They are the 
critical resource; if you run out of those you start dropping frames 
coming in from the wire.  TCP buffering is designed to pull data from 
those packet buffers as quickly as possible so that they may be 
recycled.  (In the case where you have a lot of small incoming packets 
that is really critical because every incoming packet is allocated 1514 
bytes whether it needs it or not.)  The TCP buffer is organized as a 
ring buffer so it is more space efficient.

The application reads from the TCP buffer and writes to the filesystem.  
All of this is still single threaded and for most systems the bottle 
neck is the disk access time, not the copying of data from multiple 
buffers.  At a minimum all reads and writes to the filesystem should be 
done in multiples of 512 bytes; anything less requires DOS to do a 
read/modify/write as it writes data to the blocks of the filesystem.  
1KB reads and writes were very inefficient to do - after some 
experimenting I found that 8KB was good.  16 or 32KB are marginally better.

The buffer sizes generally are not larger because larger does not make 
that much of a difference in the performance of the filesystem writes 
and does have a negative impact on buffering.  Long writes delay TCP 
protocol processing, causing incoming buffers to run low and delaying 
the sending of ACK packets for the received data.  The major opportunity 
for improving performance is to send the ACK for the packet as quickly 
as possible, right after TCP goes through protocol processing but before 
the application tries to read the receive buffer and empty it.  Getting 
that ACK out early keeps the flow of data constant and hides some of the 
latency the 'receive data/write data' cycle that is occurring.

Another optimization I could make to this would be to have the FTP 
application handle the raw packet buffers directly.  TCP would continue 
to do protocol processing, but instead of copying the data to a TCP 
receive buffer it would just give FTP the raw packets and let FTP do the 
copying.  This was the original design and the first netcat code used 
this technique.  The technique removes some memcpy overhead, but the 
largest overhead comes from the filesystem write.  It made the end 
application code (FTP) more complex and error prone, and had a nasty 
habit of starving the packet driver for buffers if the disk write was 
too large.

Most of my testing is on lower-end machines, like a 386-40 and the 
various 8088 machines that I have.  The performance of memcpy is far 
better on the newer processors due to pipeline efficiency and levels of 
caching.  (Even the 386-40 has a 128K L2 cache.)  Disk becomes the major 
bottleneck on the faster machines.

>> But for older machines, it is more than adequate. I'll try a few tests
>> here - I also test under VMware, VirtualBox, and DOSBox but I never
>> thought to compare the performance to the native TCP/IP stack.
> It's more than adequate indeed, I'm not complaining hehe. Only wanted to
> start out by eliminating slow servers as potential bottleneck. The
> listed ftp://ftp.xs4all.nl/pub/test/10mb.bin should be same for everyone
> as it's intended by that ISP for speedtest purposes.
>

Just for comparison, on my local network here the AMD 80386-40 clone 
with an NE2000 adapter on a 10Mb segment is connected to a Pentium 233 
running Fedora 2.  My FTP receive speed on this network is 506KB/sec, 
which is similar to what you are reporting for your virtual machine.  My 
Pentium 133 running with a PCI Ethernet card receives at 1950KB/sec, 
which is almost four times faster than what you are reporting.  Your VM 
might be adding unnecessary overhead.


Mike



------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Re: [Freedos-devel] mTCP FTP performance (Was: watcom tcp

Reply via email to