Hi list,
We have hit and reproduce this issue for several times, ceph will
suicide because FileStore: sync_entry timed out after a very heavy random IO on
top of the RBD.
My test environment is:
4 Nodes ceph cluster with 20 HDDs for OSDs and 4
Intel DCS3700 ssds for journal per node, that is 80 spindles in total
48 VMs spread across 12 Physical nodes, 48 RBD
attached to the VMs 1:1 via Qemu.
Ceph @ 0.58
XFS were used.
I am using Aiostress (something like FIO) to produce random write
requests on top of each RBDs.
From Ceph-w , ceph reports a very high Ops (10000+ /s) , but
technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K random
write.
When digging into the code, I found that the OSD write data to
Pagecache than returned, although it called ::sync_file_range, but this syscall
doesn't actually sync data to disk when it return,it's an aync call. So the
situation is , the random write will be extremely fast since it only write to
journal and pagecache, but once syncing , it will take very long time. The
speed gap between journal and OSDs exist, the amount of data that need to be
sync keep increasing, and it will certainly exceed 600s.
For more information, I have tried to reproduce this by rados
bench,but failed.
Could you please let me know if you need any more informations & have
some solutions? Thanks
Xiaoxi
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com