Thanks again, even 'du' performance is terrible on node B (testing on a directory taken from Phoronix):
# time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/ 73M /storage/test9/installed-tests/pts/pgbench-1.5.1/ real 0m21.044s user 0m0.010s sys 0m0.067s Reading the files from node B doesn't seem to help with subsequent accesses in this case: # time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null real 1m47.650s user 0m0.041s sys 0m0.212s # time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null real 1m45.636s user 0m0.042s sys 0m0.214s # time ls -laR /storage/test9/installed-tests/pts/pgbench-1.5.1>/dev/null real 1m43.180s user 0m0.069s sys 0m0.236s Of course, once I dismount the CephFS on node A everything gets as fast as it can be. Am I missing something obvious here? Yes I could drop the Linux cache as a 'fix' but that would drop the entire system's cache, sounds a bit extreme! :P Unless is there a way to drop the cache only for that single dir...? On Tue, Jun 16, 2015 at 12:15 PM, Gregory Farnum <g...@gregs42.com> wrote: > On Tue, Jun 16, 2015 at 12:11 PM, negillen negillen <negil...@gmail.com> > wrote: > > Thank you very much for your reply! > > > > Is there anything I can do to go around that? e.g. setting access caps > to be > > released after a short while? Or is there a command to manually release > > access caps (so that I could run it in cron)? > > Well, you can drop the caches. ;) > > More generally, you're running into a specific hole here. If your > clients are actually *accessing* the files then they should go into > shared mode and this will be much faster on subsequent accesses. > > > This is quite a problem because we have several applications that need to > > access a large number of files and when we set them to work on CephFS > > latency skyrockets. > > What kind of shared-file access do they have? If you have a bunch of > files being shared for read I'd expect this to be very fast. If > different clients are writing small amounts to them in round-robin > then that's unfortunately not going to work well. :( > -Greg > > > > > Thanks again and regards. > > > > On Tue, Jun 16, 2015 at 10:59 AM, Gregory Farnum <g...@gregs42.com> > wrote: > >> > >> On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen <negil...@gmail.com > > > >> wrote: > >> > Hello everyone, > >> > > >> > something very strange is driving me crazy with CephFS (kernel > driver). > >> > I copy a large directory on the CephFS from one node. If I try to > >> > perform a > >> > 'time ls -alR' on that directory it gets executed in less than one > >> > second. > >> > If I try to do the same 'time ls -alR' from another node it takes > >> > several > >> > minutes. No matter how many times I repeat the command, the speed is > >> > always > >> > abysmal. The ls works fine on the node where the initial copy was > >> > executed > >> > from. This happens with any directory I have tried, no matter what > kind > >> > of > >> > data is inside. > >> > > >> > After lots of experimenting I have found that in order to have fast ls > >> > speed > >> > for that dir from every node I need to flush the Linux cache on the > >> > original > >> > node: > >> > echo 3 > /proc/sys/vm/drop_caches > >> > Unmounting and remounting the CephFS on that node does the trick too. > >> > > >> > Anyone has a clue about what's happening here? Could this be a problem > >> > with > >> > the writeback fscache for the CephFS? > >> > > >> > Any help appreciated! Thanks and regards. :) > >> > >> This is a consequence of the CephFS "file capabilities" that we use to > >> do distributed locking on file states. When you copy the directory on > >> client A, it has full capabilities on the entire tree. When client B > >> tries to do a stat on each file in that tree, it doesn't have any > >> capabilities. So it sends a stat request to the MDS, which sends a cap > >> update to client A requiring it to pause updates on the file and share > >> its current state. Then the MDS tells client A it can keep going and > >> sends the stat to client B. > >> So that's: > >> B -> MDS > >> MDS -> A > >> A -> MDS > >> MDS -> B | MDS -> A > >> for every file you touch. > >> > >> I think the particular oddity you're encountering here is that CephFS > >> generally tries not to make clients drop their "exclusive" access caps > >> just to satisfy a stat. If you had client B doing something with the > >> files (like reading them) you would probably see different behavior. > >> I'm not sure if there's something effective we can do here or not > >> (it's just a bunch of heuristics when we should or should not drop > >> caps), but please file a bug on the tracker (tracker.ceph.com) with > >> this case. :) > >> -Greg > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com