On Wed, 10 Sep 2008, Erwin, Brock A wrote: > I understand that ceph uses sockets for client-osd and client-mds > communication, so the details of the implementation of sending > individual packets is obscured - it is left to the kernel to decide how > that works. > > However, I am trying to understand why I am not getting 100% network > saturation on a single client for writing to one very large file (i.e. > one single stream of data). Instead, I find that I am getting less than > 25% utilization of the network interface that the client is connected > to. After noticing this, I proceeded to locate the source of the > bottleneck. I discovered that processor, disk, and network utilization > on the OSDs and the MDS as well all remained below 25%. > > Then I decided to do parallel writes to different files on the client (I > wrote 20 different large files at the same time). In this case, the > network utilization on the client jumped to 100%. > > Thus, I am wondering why a single stream of data cannot utilize the > hardware fully. I am thinking this maybe has something to do with the > way the kernel buffers data before it is sent over a socket. Maybe > larger amounts of data need to be sent at any given time, or maybe the > ceph client needs to buffer its data until it gets to a particular size > before it writes to the open socket. However, I am just thinking aloud > at this point given the fact that I do not completely understand how > client communicates its data to the OSDs.
I think the problem is ceph's writepages() implementation. If all of your IO is to a single file (on a single client), there is only a single thread doing writeback, and writepage is synchronous (i.e. only one outstanding IO at a time). That will never fully saturate the network link. IIRC Evgeniy's POHMELFS had an async writepages(), which would fire off a bunch of async IOs with some appropriate throttling mechanism, and the completion callback would handle unlocking the appropriate pages and so forth. I haven't looked into it yet, but that sounds like the way forward... > P.S. I was also wondering if you have any other kind of design > documentation on Ceph to help me get a 'big picture' view of ceph. > Architecture diagrams or component diagrams maybe? The best high-level overview is probably still this (slightly dated) paper: http://www.usenix.org/events/osdi06/tech/weil.html There's also my thesis: http://ceph.newdream.net/weil-thesis.pdf and some other papers covering specifics (data placement, distributed object storage) at http://ceph.newdream.net/publications sage ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel