CephFS kernel mount blocks reads while other client has dirty data in
its page cache.   Cache coherency rule looks like:

state 1 - only one client opens a file for read/write.  the client can
use page cache
state 2 - multiple clients open a file for read, no client opens the
file for wirte. clients can use page cache
state 3 - multiple clients open a file for read/write. client are not
allowed to use page cache.

The behavior you saw is likely caused by state transition from 1 to 3

On Fri, Mar 8, 2019 at 8:15 AM Gregory Farnum <[email protected]> wrote:
>
> In general, no, this is not an expected behavior.
>
> My guess would be that something odd is happening with the other clients you 
> have to the system, and there's a weird pattern with the way the file locks 
> are being issued. Can you be more precise about exactly what workload you're 
> running, and get the output of the session list on your MDS while doing so?
> -Greg
>
> On Wed, Mar 6, 2019 at 9:49 AM Andrew Richards 
> <[email protected]> wrote:
>>
>> We discovered recently that our CephFS mount appeared to be halting reads 
>> when writes were being synched to the Ceph cluster to the point it was 
>> affecting applications.
>>
>> I also posted this as a Gist with embedded graph images to help illustrate: 
>> https://gist.github.com/keeperAndy/aa80d41618caa4394e028478f4ad1694
>>
>> The following is the plain text from the Gist.
>>
>> First, details about the host:
>>
>> ````
>>     $ uname -r
>>     4.16.13-041613-generic
>>
>>     $ egrep 'xfs|ceph' /proc/mounts
>>     192.168.1.115:6789,192.168.1.116:6789,192.168.1.117:6789:/ /cephfs ceph 
>> rw,noatime,name=cephfs,secret=<hidden>,rbytes,acl,wsize=16777216 0 0
>>     /dev/mapper/tst01-lvidmt01 /rbd_xfs xfs 
>> rw,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota 0 0
>>
>>     $ ceph -v
>>     ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
>> (stable)
>>
>>     $ cat /proc/net/bonding/bond1
>>     Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>>
>>     Bonding Mode: adaptive load balancing
>>     Primary Slave: None
>>     Currently Active Slave: net6
>>     MII Status: up
>>     MII Polling Interval (ms): 100
>>     Up Delay (ms): 200
>>     Down Delay (ms): 200
>>
>>     Slave Interface: net8
>>     MII Status: up
>>     Speed: 10000 Mbps
>>     Duplex: full
>>     Link Failure Count: 2
>>     Permanent HW addr: e4:1d:2d:17:71:e1
>>     Slave queue ID: 0
>>
>>     Slave Interface: net6
>>     MII Status: up
>>     Speed: 10000 Mbps
>>     Duplex: full
>>     Link Failure Count: 1
>>     Permanent HW addr: e4:1d:2d:17:71:e0
>>     Slave queue ID: 0
>>
>> ````
>>
>> We had CephFS mounted alongside an XFS filesystem made up of 16 RBD images 
>> aggregated under LVM as our storage targets. The link to the Ceph cluster 
>> from the host is a mode 6 2x10GbE bond (bond1 above).
>>
>> We started capturing network counters from the Ceph cluster connection 
>> (bond1) on the host using ifstat at its most granular setting of 0.1 
>> (sampling every tenth of a second). We then ran various overlapping read and 
>> write operations in separate shells on the same host to obtain samples of 
>> how our different means of accessing Ceph handled this. We converted our 
>> ifstat output to CSV and insterted it into a spreadsheet to visualize the 
>> network activity.
>>
>> We found that the CephFS kernel mount did indeed appear to pause ongoing 
>> reads when writes were being flushed from the page cache to the Ceph cluster.
>>
>> We wanted to see if we could make this more pronounced, so we added a 
>> 6Gb-limit tc filter to the interface and re-ran our tests. This yielded much 
>> lengthier delay periods in the reads while the writes were more slowly 
>> flushed from the page cache to the Ceph cluster.
>>
>> A more restrictive 2Gbit-limit tc filter produced much lengthier delays of 
>> our reads as the writes were synched to the cluster.
>>
>> When we tested the same I/O on the RBD-backed XFS file system on the same 
>> host, we found a very different pattern. The reads seemed to be given 
>> priority over the write activity, but the writes were only slowed, they were 
>> not halted.
>>
>> Finally we tested overlapping SMB client reads and writes to a Samba share 
>> that used the userspace libceph-based VFS_Ceph module to produce the share. 
>> In this case, while raw throughput was lower than that of the kernel, the 
>> reads and writes did not interrupt each other at all.
>>
>> Is this expected behavior for the CephFS kernel drivers? Can a CephFS kernel 
>> client really not read and write to the file system simultaneously?
>>
>> Thanks,
>> Andrew Richards
>> Senior Systems Engineer
>> Keeper Technology, LLC
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to