CephFS kernel mount blocks reads while other client has dirty data in its page cache. Cache coherency rule looks like:
state 1 - only one client opens a file for read/write. the client can use page cache state 2 - multiple clients open a file for read, no client opens the file for wirte. clients can use page cache state 3 - multiple clients open a file for read/write. client are not allowed to use page cache. The behavior you saw is likely caused by state transition from 1 to 3 On Fri, Mar 8, 2019 at 8:15 AM Gregory Farnum <[email protected]> wrote: > > In general, no, this is not an expected behavior. > > My guess would be that something odd is happening with the other clients you > have to the system, and there's a weird pattern with the way the file locks > are being issued. Can you be more precise about exactly what workload you're > running, and get the output of the session list on your MDS while doing so? > -Greg > > On Wed, Mar 6, 2019 at 9:49 AM Andrew Richards > <[email protected]> wrote: >> >> We discovered recently that our CephFS mount appeared to be halting reads >> when writes were being synched to the Ceph cluster to the point it was >> affecting applications. >> >> I also posted this as a Gist with embedded graph images to help illustrate: >> https://gist.github.com/keeperAndy/aa80d41618caa4394e028478f4ad1694 >> >> The following is the plain text from the Gist. >> >> First, details about the host: >> >> ```` >> $ uname -r >> 4.16.13-041613-generic >> >> $ egrep 'xfs|ceph' /proc/mounts >> 192.168.1.115:6789,192.168.1.116:6789,192.168.1.117:6789:/ /cephfs ceph >> rw,noatime,name=cephfs,secret=<hidden>,rbytes,acl,wsize=16777216 0 0 >> /dev/mapper/tst01-lvidmt01 /rbd_xfs xfs >> rw,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota 0 0 >> >> $ ceph -v >> ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous >> (stable) >> >> $ cat /proc/net/bonding/bond1 >> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) >> >> Bonding Mode: adaptive load balancing >> Primary Slave: None >> Currently Active Slave: net6 >> MII Status: up >> MII Polling Interval (ms): 100 >> Up Delay (ms): 200 >> Down Delay (ms): 200 >> >> Slave Interface: net8 >> MII Status: up >> Speed: 10000 Mbps >> Duplex: full >> Link Failure Count: 2 >> Permanent HW addr: e4:1d:2d:17:71:e1 >> Slave queue ID: 0 >> >> Slave Interface: net6 >> MII Status: up >> Speed: 10000 Mbps >> Duplex: full >> Link Failure Count: 1 >> Permanent HW addr: e4:1d:2d:17:71:e0 >> Slave queue ID: 0 >> >> ```` >> >> We had CephFS mounted alongside an XFS filesystem made up of 16 RBD images >> aggregated under LVM as our storage targets. The link to the Ceph cluster >> from the host is a mode 6 2x10GbE bond (bond1 above). >> >> We started capturing network counters from the Ceph cluster connection >> (bond1) on the host using ifstat at its most granular setting of 0.1 >> (sampling every tenth of a second). We then ran various overlapping read and >> write operations in separate shells on the same host to obtain samples of >> how our different means of accessing Ceph handled this. We converted our >> ifstat output to CSV and insterted it into a spreadsheet to visualize the >> network activity. >> >> We found that the CephFS kernel mount did indeed appear to pause ongoing >> reads when writes were being flushed from the page cache to the Ceph cluster. >> >> We wanted to see if we could make this more pronounced, so we added a >> 6Gb-limit tc filter to the interface and re-ran our tests. This yielded much >> lengthier delay periods in the reads while the writes were more slowly >> flushed from the page cache to the Ceph cluster. >> >> A more restrictive 2Gbit-limit tc filter produced much lengthier delays of >> our reads as the writes were synched to the cluster. >> >> When we tested the same I/O on the RBD-backed XFS file system on the same >> host, we found a very different pattern. The reads seemed to be given >> priority over the write activity, but the writes were only slowed, they were >> not halted. >> >> Finally we tested overlapping SMB client reads and writes to a Samba share >> that used the userspace libceph-based VFS_Ceph module to produce the share. >> In this case, while raw throughput was lower than that of the kernel, the >> reads and writes did not interrupt each other at all. >> >> Is this expected behavior for the CephFS kernel drivers? Can a CephFS kernel >> client really not read and write to the file system simultaneously? >> >> Thanks, >> Andrew Richards >> Senior Systems Engineer >> Keeper Technology, LLC >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
