Hi Sage, As you advised us, we switched to the release 0.19 of ceph and ran into another bug in the ceph client. When writing to a file with the O_SYNC flag, "0" is always returned although the data is written to disk. This poses a problem in our benchmark which uses the return value as number of bytes written. Also it seems that such behavior infringes the POSIX write() contract.
Attached is a small unit test in c++. The unit test creates 2 files which are exactly the same, both filled randomly with numbers 0-9. Afterwards the both files are closed. Then one file is reopened and filled with 1's. Running the test: $ g++ temp.cc $ ./a.out 100 (this is the number of bytes in the files) Each time 0 is returned it is printed out on the screen. Run the executable a.out from within a directory on a ceph file system. After the program finishes you will find 2 files: ./test - filled with one's ./test.start - filled with random numeric data If you run this test on NFS and ceph you will see that no errors are printed out on the NFS file system, and 100 errors are printed out on ceph. Thanks, Roman & Roman -----Original Message----- From: Sage Weil [mailto:s...@newdream.net] Sent: Friday, February 19, 2010 8:39 PM To: Talyansky, Roman Cc: ceph-devel@lists.sourceforge.net Subject: Re: [ceph-devel] Write operation is stuck Hi Roman, On Fri, 19 Feb 2010, Talyansky, Roman wrote: > Since I test several ceph versions simultaneously I could confuse the error > checking at different nodes. > I'll double check this and let you know. Thanks. If you haven't switched to the just-released 0.19, now might be the time to do that. > > It also looks like the IO is synchronous, which may have something > > to do with your performance. Are you mounting with -o sync or using > > direct IO, or are multiple clients reading and writing to the same file or > > something? > > The IO is indeed synchronous. However the performance under ceph is much > worse than even under nfs, which looks strange. I do not mount with -o > synch. And in our experiments multiple clients read and write the same > file. If you are accessing the same file from multiple clients, then any comparison with nfs is going to be misleading. NFS provides only close to open consistency, so IO will be buffered and inconsistent. Ceph provides fully consistent semantics by switching to synchronous IO when there are multiple clients. Ceph will be slower, but correct; nfs will be fast, but incorrect. If your application is smart enough to handle it's own consistency (each client is writing to a different region of the file) then you probably want something along the lines of O_LAZY [1], so that the application can tell the FS not to worry about consistency and stick with buffered IO. Unfortunately O_LAZY doesn't exist in Linux at this point. There is some preliminary support for it in Ceph... if that's what you're looking for, we can cook up some patches for you. If you can find us in #ceph on irc.oftc.net that might be a quicker way to diagnose the performance problems with your workload. Thanks! sage [1] http://www.pdl.cmu.edu/posix/docs/posix_lazy_io.pdf
temp.cc
Description: temp.cc
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev
_______________________________________________ Ceph-devel mailing list Ceph-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ceph-devel