On Sat, 2009-04-11 at 07:48 -0500, Jon Nelson wrote:

> 
> I *would* suggest that if dd is to be used, that 'conv=fdatasync' be
> used - this tells dd to issue an fdatasync(2) call before closing -
> this forces the block layer to write out all *cached* data for this
> file. Otherwise, dd may have written 1G and the block layer is caching
> 300MB or more - but writing to the cache is writing to memory and
> therefore the performance values can be rather exaggerated.

Agreed. Testing with a blade using O_DIRECT is just silly, in the real
world / usage you're going to have buffers on both ends. Very rarely
will you break those buffers.

If some RDBMS is using O_DIRECT, its because they have implemented their
own cache. If you are not implementing your own cache and want to advise
the kernel on how the file should be treated, we have posix_fadvise()
and posix_madvise(), unfortunately vblade doesn't offer an interface to
the prior.

The only problem with fdatasync() and friends comes with device mapper
itself, and the fact that it does not pass write barriers to logical
volumes that span multiple physical devices.. hence write cache on the
drive(s) becomes an issue.

But I don't think most people disable write cache in real world testing,
doing so would be almost as silly as using O_DIRECT while trying to get
real world results :)

So, if you fit into the above DM use criteria, fsync() is currently
useless. Write barrier requests just return "yeah, I did it", and your
numbers are therefore equally useless if data integrity is your eventual
goal.

But, the fact remains, in the real world, there are buffers (and battery
backed RAID cards ... )

AoE's only downfall is the size of the ethernet frame, as others have
said (and buffers accounting for that frame size accordingly).

It comes down to, are you testing for speed, or speed + your data
actually being written under normal use? I can only suggest looking at
the protocol and its limitations, while realizing that in normal
circumstances you'll let Linux be Linux.

I'll probably get a dozen flames over this.

Cheers,
--Tim


> 
>  
>         I don't think iSCSI can give better throughput, because it has
>         tcp/ip overhead.
> 
> 
> It sure can because TCP/IP has very sophisticated mechanisms (window
> scaling, pluggable algorithms) to handle packet loss, fairness,
> stability, etc.. I have seen nearly perfect TCP/IP throughput on gig-e
> links (well north of 120MB/s) and have never gotten anywhere near that
> with AoE. I get /substantially/ better I/O with nbd (a TCP/IP based
> network block device on Linux) than I do with AoE for just this
> reason. This is not to knock AoE!! I feel that AoE is a perfectly
> appropriate, even ideal solution for the set of problems that it
> solves, but to say that iSCSI could not possibly be faster than AoE is
> to forget about the decades of work that has gone into making TCP/IP
> scale and perform.
> 
> -- 
> Jon
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________ Aoetools-discuss mailing list 
> Aoetools-discuss@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

Reply via email to