On Wed, 10 Sep 2008, Erwin, Brock A wrote:
> I understand that ceph uses sockets for client-osd and client-mds
> communication, so the details of the implementation of sending
> individual packets is obscured - it is left to the kernel to decide how
> that works.
> 
> However, I am trying to understand why I am not getting 100% network
> saturation on a single client for writing to one very large file (i.e.
> one single stream of data).  Instead, I find that I am getting less than
> 25% utilization of the network interface that the client is connected
> to.  After noticing this, I proceeded to locate the source of the
> bottleneck.  I discovered that processor, disk, and network utilization
> on the OSDs and the MDS as well all remained below 25%.
> 
> Then I decided to do parallel writes to different files on the client (I
> wrote 20 different large files at the same time).  In this case, the
> network utilization on the client jumped to 100%.
> 
> Thus, I am wondering why a single stream of data cannot utilize the
> hardware fully.  I am thinking this maybe has something to do with the
> way the kernel buffers data before it is sent over a socket.  Maybe
> larger amounts of data need to be sent at any given time, or maybe the
> ceph client needs to buffer its data until it gets to a particular size
> before it writes to the open socket.  However, I am just thinking aloud
> at this point given the fact that I do not completely understand how
> client communicates its data to the OSDs.

I think the problem is ceph's writepages() implementation.  If all of your 
IO is to a single file (on a single client), there is only a single thread 
doing writeback, and writepage is synchronous (i.e. only one outstanding 
IO at a time).  That will never fully saturate the network link.

IIRC Evgeniy's POHMELFS had an async writepages(), which would fire off a 
bunch of async IOs with some appropriate throttling mechanism, and the 
completion callback would handle unlocking the appropriate pages and so 
forth.  I haven't looked into it yet, but that sounds like the way 
forward...

> P.S.  I was also wondering if you have any other kind of design
> documentation on Ceph to help me get a 'big picture' view of ceph.
> Architecture diagrams or component diagrams maybe?

The best high-level overview is probably still this (slightly dated) 
paper:
 http://www.usenix.org/events/osdi06/tech/weil.html

There's also my thesis:
 http://ceph.newdream.net/weil-thesis.pdf

and some other papers covering specifics (data placement, distributed 
object storage) at
 http://ceph.newdream.net/publications

sage


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ceph-devel mailing list
Ceph-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ceph-devel

Reply via email to