Hi ! 

I meet a confused case: 

When write to cephfs and rbd at same time, after a while, rbd process is hang 
and i find: 

kernel:rbd:rbd0: encountered watch error: -10 

I try to reproduce with below action and succeed: 

- run 2 dd process to write to cephfs 
- do file write action on rbd 

I find that lots of cpu are in iowait status, and lots of kernel process in D 
status. 

I guess that: 

- the process in the D state is mainly kswapd and writeback dirty page 
write-back thread process. 
when IO wait queue of the rbd disk is very long, then any process do IO 
operations on rbd disk, 
they need to be queued and wait for a long time and in the D state, the kernel 
will automatically print out the call stack after more than 120s 

- rbd hang since rbd client use watch-notify to communicate, when iowait stress 
is high, may do impact on it 

- cephfs and rbd share network bandwidth, and we use 40GB IB for ceph, network 
speed is too faster than disk speed 

Only workaround i can think about is refresh page cache by crond, but it may 
result in performance degradation. 

Could someone help me? 

Why rbd hang and how can I fix? 

I really want to use cephfs and rbd at same time, but this issue is so bad for 
production environment. 

Thanks 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to