[Gluster-users] Mount sometimes stops responding during server's MD RAID check sync_action

Jan Wrona Tue, 16 May 2017 07:14:13 -0700

Hi,

I have three servers in the linked list topology [1], GlusterFS 3.8.10,CentOS 7. Each server has two bricks, both on the same XFS filesystem.The XFS is constructed over the whole MD RAID device:md5 : active raid5 sdj1[6] sdh1[8] sde1[2] sdg1[9] sdd1[1] sdi1[5]sdf1[3] sdc1[0]6836411904 blocks super 1.2 level 5, 512k chunk, algorithm 2[8/8] [UUUUUUUU]

      bitmap: 2/8 pages [8KB], 65536KB chunk

Everything works fine until one of the RAID devices starts its regularcheck. During the check, the client's mount sometimes completely stopsresponding. I'm mounting using the Pacemaker's Filesystem OCF RA [2]with OCF_CHECK_LEVEL=20, which basically tries to write a small statusfile to the filesystem every 2 minutes to see if its OK. But even thissmall write operation sometimes times out (2 minutes) during the check.Pacemaker then remounts the Gluster and everything goes back to normal.

I understand that the RAID check is draining a lot of I/O performance,but the underlying XFS remains responsive (of course it is slower, butby far not as much as Gluster). The check intervals on the servers arenot overlapping. I've even decreased the/proc/sys/dev/raid/speed_limit_max from the default 200 MB/s to the 50MB/s, but it helped only a little, the mount still tends to freeze for afew seconds during the check.


What are your suggestions to solve this issue?

Regards,
Jan Wrona

[1]https://joejulian.name/blog/how-to-expand-glusterfs-replicated-clusters-by-one-server/[2]https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Mount sometimes stops responding during server's MD RAID check sync_action

Reply via email to