On Sat, 2009-04-11 at 07:48 -0500, Jon Nelson wrote: > > I *would* suggest that if dd is to be used, that 'conv=fdatasync' be > used - this tells dd to issue an fdatasync(2) call before closing - > this forces the block layer to write out all *cached* data for this > file. Otherwise, dd may have written 1G and the block layer is caching > 300MB or more - but writing to the cache is writing to memory and > therefore the performance values can be rather exaggerated.
Agreed. Testing with a blade using O_DIRECT is just silly, in the real world / usage you're going to have buffers on both ends. Very rarely will you break those buffers. If some RDBMS is using O_DIRECT, its because they have implemented their own cache. If you are not implementing your own cache and want to advise the kernel on how the file should be treated, we have posix_fadvise() and posix_madvise(), unfortunately vblade doesn't offer an interface to the prior. The only problem with fdatasync() and friends comes with device mapper itself, and the fact that it does not pass write barriers to logical volumes that span multiple physical devices.. hence write cache on the drive(s) becomes an issue. But I don't think most people disable write cache in real world testing, doing so would be almost as silly as using O_DIRECT while trying to get real world results :) So, if you fit into the above DM use criteria, fsync() is currently useless. Write barrier requests just return "yeah, I did it", and your numbers are therefore equally useless if data integrity is your eventual goal. But, the fact remains, in the real world, there are buffers (and battery backed RAID cards ... ) AoE's only downfall is the size of the ethernet frame, as others have said (and buffers accounting for that frame size accordingly). It comes down to, are you testing for speed, or speed + your data actually being written under normal use? I can only suggest looking at the protocol and its limitations, while realizing that in normal circumstances you'll let Linux be Linux. I'll probably get a dozen flames over this. Cheers, --Tim > > > I don't think iSCSI can give better throughput, because it has > tcp/ip overhead. > > > It sure can because TCP/IP has very sophisticated mechanisms (window > scaling, pluggable algorithms) to handle packet loss, fairness, > stability, etc.. I have seen nearly perfect TCP/IP throughput on gig-e > links (well north of 120MB/s) and have never gotten anywhere near that > with AoE. I get /substantially/ better I/O with nbd (a TCP/IP based > network block device on Linux) than I do with AoE for just this > reason. This is not to knock AoE!! I feel that AoE is a perfectly > appropriate, even ideal solution for the set of problems that it > solves, but to say that iSCSI could not possibly be faster than AoE is > to forget about the decades of work that has gone into making TCP/IP > scale and perform. > > -- > Jon > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ Aoetools-discuss mailing list > Aoetools-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/aoetools-discuss ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/aoetools-discuss