Re: [Gluster-users] How to diagnose volume rebalance failure?

PuYun Thu, 17 Dec 2015 17:47:44 -0800

Hi Susant,

You are right, the rebalance process itself is normal now. But the writing 
brick keeps increasing during rebalancing. Current task has been running for 16 
hours, here is the top info.


===================== top ===========================
top - 08:58:27 up 3 days, 12:08,  1 user,  load average: 1.33, 1.18, 1.21
Tasks: 173 total,   1 running, 172 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.0%us, 16.9%sy,  0.0%ni, 65.7%id,  2.7%wa,  0.0%hi,  1.8%si,  0.0%st
Mem:   8060900k total,  7923204k used,   137696k free,  4528380k buffers
Swap:        0k total,        0k used,        0k free,   393444k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8555 root      20   0  950m 143m 1728 S 154.7  1.8 875:01.07 glusterfs
 8479 root      20   0 1284m 139m 1892 S 69.8  1.8 443:25.88 glusterfsd
 8497 root      20   0 2628m 1.8g 1892 S 68.2 23.0 485:31.42 glusterfsd
  874 root      20   0     0    0    0 S  2.3  0.0  65:34.68 jbd2/vdb1-8
   58 root      20   0     0    0    0 S  0.7  0.0  44:44.37 kblockd/0
   99 root      20   0     0    0    0 S  0.7  0.0  39:17.63 kswapd0
   39 root      20   0     0    0    0 S  0.3  0.0   0:16.90 events/4
=====================================================
As you can see, the PID 8497 takes 1.8g mem now. 

I have taken some state dumps. Later dumps are much bigger than the earlier.
================ ls -lh /var/run/gluster/*dump* ================
-rw------- 1 root root 4.1M Dec 17 17:52 mnt-b1-brick.8497.dump.1450345948
-rw------- 1 root root 292M Dec 18 09:08 mnt-b1-brick.8497.dump.1450400909
-rw------- 1 root root 297M Dec 18 09:15 mnt-b1-brick.8497.dump.1450401273
=====================================================

You can download these state dumps (gziped) from this url:
http://pan.baidu.com/s/1jHuZCMU




PuYun
 
From: Susant Palai
Date: 2015-12-17 20:23
To: PuYun
CC: gluster-users
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure?
Ok from your reply rebalance seems to be fine. 
So what you can do is check whether the mem-usage of brick process keeps 
increasing constantly. If that is the case take multiple state-dumps 
intermittently.
 
Regards,
Susant 
 
----- Original Message -----
From: "PuYun" <[email protected]>
To: "gluster-users" <[email protected]>
Cc: "gluster-users" <[email protected]>
Sent: Thursday, 17 December, 2015 3:57:12 PM
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure?
 
 
 
Hi Susant, 
 
 
Thank you for your instructions. I'll do that. 
 
 
My volume contains more than 2 million end sub directories. Most of the end sub 
directories contains 10~30 small files. Current total size is about 900G. Two 
bricks, each one is 1T. Current ram size is 8G. 
 
 
Previously I saw 3 processes, one is glusterfs for rebalance and 2 glusterfsd 
for bricks. Only 1 glusterfsd occupied very large mem and it is related to the 
newly added brick. The other 2 processes seems normal. If that happens again, I 
will send you the state dump. 
 
 
Thank you. 
 
PuYun

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to diagnose volume rebalance failure?

Reply via email to