Hi, (yes, I tried trimming the quote, but it almost all seems vaguely relevant, ugh)
At risk of embarrassing myself (again) by showing my ignorance ... Mike, you said this: "Disk becomes the major bottleneck on the faster machines." Doesn't FTP know the filesize of file-to-be-downloaded ahead of time? If so, then perhaps you can try Eric's idea: * create/open, seek to end-1 (in the empty output file), write a single byte, close, reopen This apparently avoids having to update the FAT over and over again redundantly. See the following thread for some examples (esp. nidud's comment about Doszip: "This reduced the compression time from 455 to 180 sec."). http://www.bttr-software.de/forum/board_entry.php?id=8862#p8879 I may be seriously off-base here (no surprise), but I felt I should mention it "just in case"! On 7/7/11, Michael B. Brutman <mbbrut...@brutman.com> wrote: > On 7/6/2011 7:10 PM, Bernd Blaauw wrote: >> Op 7-7-2011 1:32, Michael B. Brutman schreef: >>> mTCP FTP compares poorly to the native stack and FireFox there, but FTP >>> is working in a very limited environment: >>> >>> * The TCP/IP socket receive buffer is tiny compared to the native >>> network stack >>> * You are doing filesystem writes 8KB at a time >>> * You have a small virtualization penalty >>> * The packet interface wasn't designed for high performance; every >>> incoming packet requires a hardware interrupt and two software >>> interrupts >> >> 8KB filesystem writes? odd. So it's: >> 1) download/transfer 8KB (8KB transfer buffer) >> 2) halt download, dump transfer buffer to disk and clear it >> 3) continue downloading. > > Not so odd. All comm code fills buffers and then processes the > buffers. Unless you have a multi-core system you are always halting the > processing of TCP/IP protocol handling to do your disk writes. Modern > OSes with DMA support hide some of that by letting the DMA controller of > the disk (and possibly the Ethernet controller if so equipped) do the > byte copying work. But in the absence of DMA the host CPU does > everything, and does it in a single threaded manner. > >> Easier at least compared to having a 8KB transfer buffer plus a 'huge' >> receive buffer (nearly size of all of machine's conventional memory, a >> multiple of 8KB?) followed by only clearing the buffer if it's full or a >> file has been downloaded completely (whichever comes first). Your single >> buffer might be more efficient compared to transfer buffer plus receive >> buffer. >> > In this environment we are entirely single threaded, except for the > hardware buffering that happens on the Ethernet card. To receive a > packet the path looks like this: > > - the card receives and buffers the frame from the wire > - the card signals a hardware interrupt > - the packet driver responds and either interrogates the card or copies > the contents of the frame > - the packet driver makes an upcall to the TCP/IP code > - the TCP/IP code either provides a buffer or says 'no room' > - the packet driver makes a second upcall to let the TCP/IP code know > the frame is copied > - the interrupt ends and the interrupted code resumes > - the packet must now go through IP and TCP protocol processing > > The buffering scheme works at three levels: > > - Raw packet buffers (20 at 1514 bytes) > - TCP receive buffering (8KB) > - File read/write chunk size (8KB) > > Raw packet buffers are used by the packet driver directly. They are the > critical resource; if you run out of those you start dropping frames > coming in from the wire. TCP buffering is designed to pull data from > those packet buffers as quickly as possible so that they may be > recycled. (In the case where you have a lot of small incoming packets > that is really critical because every incoming packet is allocated 1514 > bytes whether it needs it or not.) The TCP buffer is organized as a > ring buffer so it is more space efficient. > > The application reads from the TCP buffer and writes to the filesystem. > All of this is still single threaded and for most systems the bottle > neck is the disk access time, not the copying of data from multiple > buffers. At a minimum all reads and writes to the filesystem should be > done in multiples of 512 bytes; anything less requires DOS to do a > read/modify/write as it writes data to the blocks of the filesystem. > 1KB reads and writes were very inefficient to do - after some > experimenting I found that 8KB was good. 16 or 32KB are marginally better. > > The buffer sizes generally are not larger because larger does not make > that much of a difference in the performance of the filesystem writes > and does have a negative impact on buffering. Long writes delay TCP > protocol processing, causing incoming buffers to run low and delaying > the sending of ACK packets for the received data. The major opportunity > for improving performance is to send the ACK for the packet as quickly > as possible, right after TCP goes through protocol processing but before > the application tries to read the receive buffer and empty it. Getting > that ACK out early keeps the flow of data constant and hides some of the > latency the 'receive data/write data' cycle that is occurring. > > Another optimization I could make to this would be to have the FTP > application handle the raw packet buffers directly. TCP would continue > to do protocol processing, but instead of copying the data to a TCP > receive buffer it would just give FTP the raw packets and let FTP do the > copying. This was the original design and the first netcat code used > this technique. The technique removes some memcpy overhead, but the > largest overhead comes from the filesystem write. It made the end > application code (FTP) more complex and error prone, and had a nasty > habit of starving the packet driver for buffers if the disk write was > too large. > > Most of my testing is on lower-end machines, like a 386-40 and the > various 8088 machines that I have. The performance of memcpy is far > better on the newer processors due to pipeline efficiency and levels of > caching. (Even the 386-40 has a 128K L2 cache.) Disk becomes the major > bottleneck on the faster machines. ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel