On 7/6/2011 7:10 PM, Bernd Blaauw wrote: > Op 7-7-2011 1:32, Michael B. Brutman schreef: >> mTCP FTP compares poorly to the native stack and FireFox there, but FTP >> is working in a very limited environment: >> >> * The TCP/IP socket receive buffer is tiny compared to the native >> network stack >> * You are doing filesystem writes 8KB at a time >> * You have a small virtualization penalty >> * The packet interface wasn't designed for high performance; every >> incoming packet requires a hardware interrupt and two software >> interrupts > I'm happy with whatever I can get. My real hardware has an Nvidia > chipset network driver for which no packet drivers exist, so sticking to > virtual machines. I wonder if any PCI (or even PCI-express or onboard) > network cards still support packet drivers. > > 8KB filesystem writes? odd. So it's: > 1) download/transfer 8KB (8KB transfer buffer) > 2) halt download, dump transfer buffer to disk and clear it > 3) continue downloading.
Not so odd. All comm code fills buffers and then processes the buffers. Unless you have a multi-core system you are always halting the processing of TCP/IP protocol handling to do your disk writes. Modern OSes with DMA support hide some of that by letting the DMA controller of the disk (and possibly the Ethernet controller if so equipped) do the byte copying work. But in the absence of DMA the host CPU does everything, and does it in a single threaded manner. > Easier at least compared to having a 8KB transfer buffer plus a 'huge' > receive buffer (nearly size of all of machine's conventional memory, a > multiple of 8KB?) followed by only clearing the buffer if it's full or a > file has been downloaded completely (whichever comes first). Your single > buffer might be more efficient compared to transfer buffer plus receive > buffer. > > Or perhaps I should stay silent hehe, failed miserably while learning > about OSI layers and TCP/IP. I suspect that the Intel chipsets on PCI-X cards will work; their PCI chipset cards had working packet drivers and from a software standpoint PCI-X is identical to PCI. On the wiki at the Google project page I have three different PCI cards listed that are known to work. (And I'd like to hear about more.) In this environment we are entirely single threaded, except for the hardware buffering that happens on the Ethernet card. To receive a packet the path looks like this: - the card receives and buffers the frame from the wire - the card signals a hardware interrupt - the packet driver responds and either interrogates the card or copies the contents of the frame - the packet driver makes an upcall to the TCP/IP code - the TCP/IP code either provides a buffer or says 'no room' - the packet driver makes a second upcall to let the TCP/IP code know the frame is copied - the interrupt ends and the interrupted code resumes - the packet must now go through IP and TCP protocol processing The buffering scheme works at three levels: - Raw packet buffers (20 at 1514 bytes) - TCP receive buffering (8KB) - File read/write chunk size (8KB) Raw packet buffers are used by the packet driver directly. They are the critical resource; if you run out of those you start dropping frames coming in from the wire. TCP buffering is designed to pull data from those packet buffers as quickly as possible so that they may be recycled. (In the case where you have a lot of small incoming packets that is really critical because every incoming packet is allocated 1514 bytes whether it needs it or not.) The TCP buffer is organized as a ring buffer so it is more space efficient. The application reads from the TCP buffer and writes to the filesystem. All of this is still single threaded and for most systems the bottle neck is the disk access time, not the copying of data from multiple buffers. At a minimum all reads and writes to the filesystem should be done in multiples of 512 bytes; anything less requires DOS to do a read/modify/write as it writes data to the blocks of the filesystem. 1KB reads and writes were very inefficient to do - after some experimenting I found that 8KB was good. 16 or 32KB are marginally better. The buffer sizes generally are not larger because larger does not make that much of a difference in the performance of the filesystem writes and does have a negative impact on buffering. Long writes delay TCP protocol processing, causing incoming buffers to run low and delaying the sending of ACK packets for the received data. The major opportunity for improving performance is to send the ACK for the packet as quickly as possible, right after TCP goes through protocol processing but before the application tries to read the receive buffer and empty it. Getting that ACK out early keeps the flow of data constant and hides some of the latency the 'receive data/write data' cycle that is occurring. Another optimization I could make to this would be to have the FTP application handle the raw packet buffers directly. TCP would continue to do protocol processing, but instead of copying the data to a TCP receive buffer it would just give FTP the raw packets and let FTP do the copying. This was the original design and the first netcat code used this technique. The technique removes some memcpy overhead, but the largest overhead comes from the filesystem write. It made the end application code (FTP) more complex and error prone, and had a nasty habit of starving the packet driver for buffers if the disk write was too large. Most of my testing is on lower-end machines, like a 386-40 and the various 8088 machines that I have. The performance of memcpy is far better on the newer processors due to pipeline efficiency and levels of caching. (Even the 386-40 has a 128K L2 cache.) Disk becomes the major bottleneck on the faster machines. >> But for older machines, it is more than adequate. I'll try a few tests >> here - I also test under VMware, VirtualBox, and DOSBox but I never >> thought to compare the performance to the native TCP/IP stack. > It's more than adequate indeed, I'm not complaining hehe. Only wanted to > start out by eliminating slow servers as potential bottleneck. The > listed ftp://ftp.xs4all.nl/pub/test/10mb.bin should be same for everyone > as it's intended by that ISP for speedtest purposes. > Just for comparison, on my local network here the AMD 80386-40 clone with an NE2000 adapter on a 10Mb segment is connected to a Pentium 233 running Fedora 2. My FTP receive speed on this network is 506KB/sec, which is similar to what you are reporting for your virtual machine. My Pentium 133 running with a PCI Ethernet card receives at 1950KB/sec, which is almost four times faster than what you are reporting. Your VM might be adding unnecessary overhead. Mike ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel