Re: [Gluster-users] heaps split-brains during back-transfert

Vijaikumar M Mon, 03 Aug 2015 20:54:41 -0700

Adding Raghavendra.G for RDMA issue...


Hi Geoffrey,

Please find my comments in-line..

Thanks,
Vijay


On Monday 03 August 2015 09:15 PM, Geoffrey Letessier wrote:

Hi Vijay,
Yes of course, i sent my email after making some tests and checks andthe result was still wrong (even after a couple of hours/1day afterhaving forced the start of every bricks) … until i decided to do a« du » on every quota path. Now, all seems to ~OK as you can read below:
# gluster volume quota vol_home list
Path Hard-limitSoft-limit Used Available Soft-limit exceeded? Hard-limit exceeded?
---------------------------------------------------------------------------------------------------------------------------
/simlab_team 5.0TB 80% 1.2TB 3.8TB NoNo/amyloid_team 7.0TB 80% 4.9TB 2.1TBNo No/amyloid_team/nguyen 3.5TB 80% 2.0TB 1.5TBNo No/sacquin_team 10.0TB 80% 55.3GB 9.9TBNo No/baaden_team 20.0TB 80% 11.5TB 8.5TBNo No/derreumaux_team 5.0TB 80% 2.2TB 2.8TBNo No/sterpone_team 14.0TB 80% 9.3TB 4.7TBNo No/admin_team 1.0TB 80% 15.8GB 1008.2GB NoNo# for path in $(gluster volume quota vol_home list|awk 'NR>2 {print$1}'); do pdsh -w storage[1,3] "du -sh/export/brick_home/brick{1,2}/data$path"; done
storage1: 219G/export/brick_home/brick1/data/simlab_team
storage3: 334G/export/brick_home/brick1/data/simlab_team
storage1: 307G/export/brick_home/brick2/data/simlab_team
storage3: 327G/export/brick_home/brick2/data/simlab_team
storage1: 1,2T/export/brick_home/brick1/data/amyloid_team
storage3: 1,2T/export/brick_home/brick1/data/amyloid_team
storage1: 1,2T/export/brick_home/brick2/data/amyloid_team
storage3: 1,2T/export/brick_home/brick2/data/amyloid_team
storage1: 505G/export/brick_home/brick1/data/amyloid_team/nguyen
storage1: 483G/export/brick_home/brick2/data/amyloid_team/nguyen
storage3: 508G/export/brick_home/brick1/data/amyloid_team/nguyen
storage3: 503G/export/brick_home/brick2/data/amyloid_team/nguyen
storage3: 16G/export/brick_home/brick1/data/sacquin_team
storage1: 14G/export/brick_home/brick1/data/sacquin_team
storage3: 13G/export/brick_home/brick2/data/sacquin_team
storage1: 13G/export/brick_home/brick2/data/sacquin_team
storage1: 3,2T/export/brick_home/brick1/data/baaden_team
storage1: 2,8T/export/brick_home/brick2/data/baaden_team
storage3: 2,9T/export/brick_home/brick1/data/baaden_team
storage3: 2,7T/export/brick_home/brick2/data/baaden_team
storage3: 588G/export/brick_home/brick1/data/derreumaux_team
storage1: 566G/export/brick_home/brick1/data/derreumaux_team
storage1: 563G/export/brick_home/brick2/data/derreumaux_team
storage3: 610G/export/brick_home/brick2/data/derreumaux_team
storage3: 2,5T/export/brick_home/brick1/data/sterpone_team
storage1: 2,7T/export/brick_home/brick1/data/sterpone_team
storage3: 2,4T/export/brick_home/brick2/data/sterpone_team
storage1: 2,4T/export/brick_home/brick2/data/sterpone_team
storage3: 519M/export/brick_home/brick1/data/admin_team
storage1: 11G/export/brick_home/brick1/data/admin_team
storage3: 974M/export/brick_home/brick2/data/admin_team
storage1: 4,0G/export/brick_home/brick2/data/admin_team

In short:
simlab_team: ~1.2TB
amyloid_team: ~4.8TB
amyloid_team/nguyen: ~2TB
sacquin_team: ~56GB
baaden_team: ~11.6TB
derreumaux_team: 2.3TB
sterpone_team: ~10TB
admin_team: ~16.5GB
There’s still some difference but it’s globally quite correct (exceptfor sterpone_team quota defined).
But, I also noticed something strange: here are the result of every« du » i did to force the « recompute » of the quota size (on theglusterfs mount point):
# du -sh /home/simlab_team/
1,2T    /home/simlab_team/
# du -sh /home/amyloid_team/
4,7T    /home/amyloid_team/
# du -sh /home/sacquin_team/
56G     /home/sacquin_team/
# du -sh /home/baaden_team/
12T     /home/baaden_team/
# du -sh /home/derreumaux_team/
2,3T    /home/derreumaux_team/
# du -sh /home/sterpone_team/
9,9T    /home/sterpone_team/
As you can above, I dont understand why the quota size computed byquota daemon is different than a "du", especially concerning the quotasize of /sterpone_team

du command can round-off the values, could you check the values with 'du-sk'?

Now, concerning all hangs i met, can you provide me the brand of yourinfiniband interconnect? From my side, we use QLogic -maybe theproblem takes its origin here (Intel/Qlogic and Mellanox are quitedifferent).
Concerning the brick logs, I just noticed I have a lot of error on oneof my brick logs and the file take around 5GB. Here is an extract:
# tail -30l /var/log/glusterfs/bricks/export-brick_home-brick1-data.log
[2015-08-03 15:32:37.408204] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.410017] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.410689] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.410860] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.412638] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.413435] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.413640] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.415325] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.416102] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.416308] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.418025] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.418799] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.419001] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.420681] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.421416] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.421607] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.423208] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.423882] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.424089] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.425863] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.426581] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.426790] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.428438] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.429133] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide][2015-08-03 15:32:37.429325] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]The message "W [MSGID: 120003] [quota.c:759:quota_build_ancestry_cbk]0-vol_home-quota: parent is NULL [Argument invalide]" repeated 9016times between [2015-08-03 15:31:55.379522] and [2015-08-0315:32:00.997113][2015-08-03 15:32:37.442244] I [MSGID: 115036][server.c:545:server_rpc_notify] 0-vol_home-server: disconnectingconnection from lucifer.lbt.ibpc.fr<http://lucifer.lbt.ibpc.fr>-21153-2015/08/03-15:31:23:33181-vol_home-client-0-0-0[2015-08-03 15:32:37.442286] I [MSGID: 101055][client_t.c:419:gf_client_unref] 0-vol_home-server: Shutting downconnection lucifer.lbt.ibpc.fr<http://lucifer.lbt.ibpc.fr>-21153-2015/08/03-15:31:23:33181-vol_home-client-0-0-0The message "E [MSGID: 113104][posix-handle.c:154:posix_make_ancestryfromgfid] 0-vol_home-posix:could not read the link from the gfid handle/export/brick_home/brick1/data/.glusterfs/19/b6/19b67130-b409-4666-9237-2661241a8847[Aucun fichier ou dossier de ce type]" repeated 755 times between[2015-08-03 15:31:25.553801] and [2015-08-03 15:31:43.528305]The message "E [MSGID: 113104][posix-handle.c:154:posix_make_ancestryfromgfid] 0-vol_home-posix:could not read the link from the gfid handle/export/brick_home/brick1/data/.glusterfs/81/5a/815acde3-7f47-410b-9131-e8d75c71a5bd[Aucun fichier ou dossier de ce type]" repeated 8147 times between[2015-08-03 15:31:25.521255] and [2015-08-03 15:31:53.593932]Do you have an idea where this issue come from and what I have to doto fix it?

We will investigate on this issue and update you soon on the same.

# grep -rc "\] E \["/var/log/glusterfs/bricks/export-brick_home-brick{1,2}-data.log
/var/log/glusterfs/bricks/export-brick_home-brick1-data.log:11038933
/var/log/glusterfs/bricks/export-brick_home-brick2-data.log:243

FYI I updated GlusterFS to the latest version (v3.7.3) 2 days ago.
Thanks in advance for the next answers. and thanks for all your help(all the support team).
Best,
Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 3 août 2015 à 08:51, Vijaikumar M <[email protected]<mailto:[email protected]>> a écrit :
Hi Geoffrey,

Please find my comments in-line.


On Saturday 01 August 2015 04:10 AM, Geoffrey Letessier wrote:
Hello,
As Krutika said, I resolved with success all split-brains (more than3450) appeared after the first data transfert from one backup serverto my new and fresh volume but…
The following step to validate my new volume was to enable the quotaon it; and now, more than one day after this activation, all theresults are still completely wrong:
Example:
# df -h /home/sterpone_team
Filesystem            Size Used Avail Use% Mounted on
ib-storage1:vol_home.tcp
                       14T 3,3T   11T  24% /home
# pdsh -w storage[1,3] du -sh/export/brick_home/brick{1,2}/data/sterpone_team
storage3: 2,5T/export/brick_home/brick1/data/sterpone_team
storage3: 2,4T/export/brick_home/brick2/data/sterpone_team
storage1: 2,7T/export/brick_home/brick1/data/sterpone_team
storage1: 2,4T/export/brick_home/brick2/data/sterpone_team
As you can read, all data for this account is around 10TB and quotadisplays only 3.3TB used.
Worse:
# pdsh -w storage[1,3] du -sh/export/brick_home/brick{1,2}/data/baaden_team
storage3: 2,9T/export/brick_home/brick1/data/baaden_team
storage3: 2,7T/export/brick_home/brick2/data/baaden_team
storage1: 3,2T/export/brick_home/brick1/data/baaden_team
storage1: 2,8T/export/brick_home/brick2/data/baaden_team
# df -h /home/baaden_team/
Filesystem            Size Used Avail Use% Mounted on
ib-storage1:vol_home.tcp
                       20T 786G   20T   4% /home
# gluster volume quota vol_home list /baaden_team
Path Hard-limit Soft-limit UsedAvailable Soft-limit exceeded? Hard-limit exceeded?
---------------------------------------------------------------------------------------------------------------------------
/baaden_team 20.0TB 80% 785.6GB 19.2TBNo No
This account is around 11.6TB and quota detects only 786GB used…
As you mentioned below, some of the bricks were down. 'quota list'will only show the aggregated value of online bricks, Could youplease check the 'quota list' when all the bricks are up and running?
I suspect quota initiate might not have completed because of brick down.
Can someone help me to fix it -knowing if I've previously updatedGlusterFS from 3.5.3 to 3.7.2 it was exactly to solve a similartrouble…
For information, in quotad log file:
[2015-07-31 22:13:00.574361] I [MSGID: 114047][client-handshake.c:1225:client_setvolume_cbk] 0-vol_home-client-7:Server and Client lk-version numbers are not same, reopening the fds[2015-07-31 22:13:00.574507] I [MSGID: 114035][client-handshake.c:193:client_set_lk_version_cbk]0-vol_home-client-7: Server lk version = 1
is there any causal connection (client/server version conflict)?
Here what i noticed onmy /var/log/glusterfs/quota-mount-vol_home.log file:
… <same kind of lines>
[2015-07-31 21:26:15.247269] I [rpc-clnt.c:1819:rpc_clnt_reconfig]0-vol_home-client-5: changing port to 49162 (from 0)[2015-07-31 21:26:15.250272] E [socket.c:2332:socket_connect_finish]0-vol_home-client-5: connection to 10.0.4.2:49162 failed (Connexionrefusée)[2015-07-31 21:26:19.250545] I [rpc-clnt.c:1819:rpc_clnt_reconfig]0-vol_home-client-5: changing port to 49162 (from 0)[2015-07-31 21:26:19.253643] E [socket.c:2332:socket_connect_finish]0-vol_home-client-5: connection to 10.0.4.2:49162 failed (Connexionrefusée)
… <same kind of lines>
Connection refused is because brick is down.
<A few minutes after:> OK, this was due to one brick which was down.It’s strange: since I have updated GlusteFS to 3.7.x I notice a lotof bricks which go down, sometimes a few moment after starting thevolume, sometime after a couple of days/weeks… What never happenedwith GlusterFS version 3.3.1 and 3.5.3.
Could please provide brick log? We will check the log on this issue,once this issue is fixed, we can initiate quota healing again.
Now, I need to stop-and-start the volume because I notice again somehangs with "gluster volume quota … ", "df", etc. One more time, i’venever noticed this kind of hangs with previous versions of GlusterFSI used; is it "expected"?
From you previous mail we tried re-creating hang problem, however itwas not re-creating.
One more time: thank you very much by advance.
Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 31 juil. 2015 à 11:26, Niels de Vos <[email protected]<mailto:[email protected]>> a écrit :
On Wed, Jul 29, 2015 at 12:44:38AM +0200, Geoffrey Letessier wrote:
OK, thank you Niels for this explanation. Now, this makes sense.
And concerning all split-brains appeared during theback-transfert, do you have an idea where is this coming from?
Sorry, no, I dont know how that is happening in your environment. I'll
try to find someone that understands more about it and can help youwith
that.

Niels
Best,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 29 juil. 2015 à 00:02, Niels de Vos <[email protected]<mailto:[email protected]>> a écrit :
On Tue, Jul 28, 2015 at 03:46:37PM +0200, Geoffrey Letessier wrote:
Hi,

In addition of all split brains reported, is it normal to notice
thousands and thousands (several tens nay hundreds of thousands)
broken symlinks browsing the .glusterfs directory on each brick?
Yes, I think it is normal. A symlink points to a particular filename,
possibly in a different directory. If the target file is located on a
different brick, the symlink points to a non-local file.

Consider this example with two bricks in a distributed volume:
- file: README
- symlink: IMPORTANT -> README
When the distribution algorithm is done, README 'hashes' tobrick-A. Thesymlink 'hashes' to brick-B. This means that README will belocaled on
brick-A, and the symlink with name IMPORTANT would be located on
brick-B. Because README is not on the same brick as IMPORTANT, the
symlink points to the non-existing file README on brick-B.

However, when a Gluster client reads the target of symlink IMPORTANT,
the Gluster client calculate the location of README and will knowthat
README can be found on brick-A.

I hope that makes sense?

Niels
For the moment, i just synchronized one remote directory (around30TB
and a few million files) into my new volume. No other operations on
files on this volume has yet been done.
How can I fix it? Can I delete these dead-symlinks? How can Ifix all
my split-brains?

Here is an example of a ls:
[root@cl-storage3 ~]# cd/export/brick_home/brick1/data/.glusterfs/7b/d2/
[root@cl-storage3 d2]# ll
total 8,7M
13706 drwx------ 2 root root 8,0K 26 juil.17:22 .2147483784 drwx------ 258 root root 8,0K 20juil. 23:07 ..2148444137 -rwxrwxrwx 2 baaden baaden_team 173K 22 mai2008 7bd200dd-1774-4395-9065-605ae30ec18b1559384 -rw-rw-r-- 2 tarus amyloid_team 4,3K 19 juin2013 7bd2155c-7a05-4edc-ae77-35ed7e16afbc287295 lrwxrwxrwx 1 root root 58 20 juil.23:38 7bd2370a-100b-411e-89a4-d184da9f0f88 ->../../a7/59/a759de6f-cdf5-43dd-809a-baf81d103bf7/prop-base2149090201 -rw-rw-r-- 2 tarus amyloid_team 76K 8 mars2014 7bd2497f-d24b-4b19-a1c5-80a4956e56a12148561174 -rw-r--r-- 2 tran derreumaux_team 575 14févr. 07:54 7bd25db0-67f5-43e5-a56a-52cf8c4c60dd1303943 -rw-r--r-- 2 tran derreumaux_team 576 10 févr.06:06 7bd25e97-18be-4faf-b122-5868582b4fd81308607 -rw-r--r-- 2 tran derreumaux_team 414K 16 juin11:05 7bd2618f-950a-4365-a753-723597ef29f545745 -rw-r--r-- 2 letessier admin_team 585 5 janv.2012 7bd265c7-e204-4ee8-8717-e4a0c393fb0f2148144918 -rw-rw-r-- 2 tarus amyloid_team 107K 28févr. 2014 7bd26c5b-d48a-481a-9ca6-2dc27768b5ad13705 -rw-rw-r-- 2 tarus amyloid_team 25K 4 juin2014 7bd27e4c-46ba-4f21-a766-389bfa52fd781633627 -rw-rw-r-- 2 tarus amyloid_team 75K 12 mars2014 7bd28631-90af-4c16-8ff0-c3d46d5026c61329165 -rw-r--r-- 2 tran derreumaux_team 175 15 juin23:40 7bd2957e-a239-4110-b3d8-b4926c7f060b797803 lrwxrwxrwx 2 baaden baaden_team 26 2 avril2007 7bd29933-1c80-4c6b-ae48-e64e4da874cb ->../divided/a7/2a7o.pdb1.gz1532463 -rw-rw-rw- 2 baaden baaden_team 1,8M 2 nov.2009 7bd29d70-aeb4-4eca-ac55-fae2d46ba9111411112 -rw-r--r-- 2 sterpone sterpone_team 3,1K 2 mai2012 7bd2a5eb-62a4-47fc-b149-31e10bd3c33d2148865896 -rw-r--r-- 2 tran derreumaux_team 2,1M 15 juin23:46 7bd2ae9c-18ca-471f-a54a-6e4aec5aea892148762578 -rw-rw-r-- 2 tarus amyloid_team 154K 11 mars2014 7bd2b7d7-7745-4842-b7b4-400791c1d149149216 -rw-r--r-- 2 vamparys sacquin_team 241K 17 mai2013 7bd2ba98-6a42-40ea-87ea-acb607d73cb52148977923 -rwxr-xr-x 2 murail baaden_team 23K 18 juin2012 7bd2cf57-19e7-451c-885d-fd02fd988d431176623 -rw-rw-r-- 2 tarus amyloid_team 227K 8 mars2014 7bd2d92c-7ec8-4af8-9043-49d1908a99dc1172122 lrwxrwxrwx 2 sterpone sterpone_team 61 17 avril12:49 7bd2d96e-e925-45f0-a26a-56b95c084122 ->../../../../../src/libs/ck-libs/ParFUM-Tops-Dev/ParFUM_TOPS.h1385933 -rw-r--r-- 2 tran derreumaux_team 2,9M 16 juin05:29 7bd2df54-17d2-4644-96b7-f8925a67ec1e745899 lrwxrwxrwx 1 root root 58 22 juil.09:50 7bd2df83-ce58-4a17-aca8-a32b71e953d4 ->../../5c/39/5c39010f-fa77-49df-8df6-8d72cf74fd64/model_0092149100186 -rw-rw-r-- 2 tarus amyloid_team 494K 17 mars2014 7bd2e865-a2f4-4d90-ab29-dccebe2e3440
Best.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 27 juil. 2015 à 22:57, Geoffrey Letessier<[email protected] <mailto:[email protected]>>a écrit :
Dears,
For a couple of weeks (more than one month), our computingproduction is stopped due to several -but amazing- troubleswith GlusterFS.
After having noticed a big problem with incorrect quota sizeaccounted for many many files, i decided under the guidance ofGluster team support to upgrade my storage cluster from version3.5.3 to the latest (3.7.2-3) because these bugs aretheoretically fixed in this branch. Now, since i’ve done thisupgrade, it’s the amazing mess and i cannot restart the production.
Indeed :
1 - RDMA protocol is not working and hang my system / shellcommands; only TCP protocol (over Infiniband) is more or lessoperational - it’s not a blocking point but…
2 - read/write performance relatively low
3 - thousands split-brains are appeared.
So, for the moment, i believe GlusterFS 3.7 is not actuallyproduction ready.
Concerning the third point: after having destroy all my volumes(RAID re-init, new partition, GlusterFS volumes, etc.),recreate the main one, I tried to back-transfert my data fromarchive/backup server info this new volume and I note a lot oferrors in my mount log file, as your can read in this extract:[2015-07-26 22:35:16.962815] I[afr-self-heal-entry.c:565:afr_selfheal_entry_do]0-vol_home-replicate-0: performing entry selfheal on865083fa-984e-44bd-aacf-b8195789d9e0[2015-07-26 22:35:16.965896] E[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]0-vol_home-replicate-0: Gfid mismatch detected for<865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>,e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0.Skipping conservative merge on the file.[2015-07-26 22:35:16.975206] I[afr-self-heal-entry.c:565:afr_selfheal_entry_do]0-vol_home-replicate-0: performing entry selfheal on29382d8d-c507-4d2e-b74d-dbdcb791ca65[2015-07-26 22:35:28.719935] E[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]0-vol_home-replicate-0: Gfid mismatch detected for<29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0.Skipping conservative merge on the file.[2015-07-26 22:35:29.764891] I[afr-self-heal-entry.c:565:afr_selfheal_entry_do]0-vol_home-replicate-0: performing entry selfheal on865083fa-984e-44bd-aacf-b8195789d9e0[2015-07-26 22:35:29.768339] E[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]0-vol_home-replicate-0: Gfid mismatch detected for<865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>,e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0.Skipping conservative merge on the file.[2015-07-26 22:35:29.775037] I[afr-self-heal-entry.c:565:afr_selfheal_entry_do]0-vol_home-replicate-0: performing entry selfheal on29382d8d-c507-4d2e-b74d-dbdcb791ca65[2015-07-26 22:35:29.776857] E[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]0-vol_home-replicate-0: Gfid mismatch detected for<29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0.Skipping conservative merge on the file.[2015-07-26 22:35:29.800535] W [MSGID: 108008][afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0:GFID mismatch for<gfid:29382d8d-c507-4d2e-b74d-dbdcb791ca65>/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0
And when I try to browse some folders (still in mount log file):
[2015-07-27 09:00:19.005763] I[afr-self-heal-entry.c:565:afr_selfheal_entry_do]0-vol_home-replicate-0: performing entry selfheal on2ac27442-8be0-4985-b48f-3328a86a6686[2015-07-27 09:00:22.322316] E[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]0-vol_home-replicate-0: Gfid mismatch detected for<2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>,9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0.Skipping conservative merge on the file.[2015-07-27 09:00:23.008771] I[afr-self-heal-entry.c:565:afr_selfheal_entry_do]0-vol_home-replicate-0: performing entry selfheal on2ac27442-8be0-4985-b48f-3328a86a6686[2015-07-27 08:59:50.359187] W [MSGID: 108008][afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0:GFID mismatch for<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012588.gro9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0[2015-07-27 09:00:02.500419] W [MSGID: 108008][afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0:GFID mismatch for<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012590.grob22aec09-2be3-41ea-a976-7b8d0e6f61f0 on vol_home-client-1 andec100f9e-ec48-4b29-b75e-a50ec6245de6 on vol_home-client-0[2015-07-27 09:00:02.506925] W [MSGID: 108008][afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 0-vol_home-replicate-0:GFID mismatch for<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0009059.gro0485c093-11ca-4829-b705-e259668ebd8c on vol_home-client-1 ande83a492b-7f8c-4b32-a76e-343f984142fe on vol_home-client-0[2015-07-27 09:00:23.001121] W [MSGID: 108008][afr-read-txn.c:241:afr_read_txn] 0-vol_home-replicate-0:Unreadable subvolume -1 found with event generation 2.(Possible split-brain)[2015-07-27 09:00:26.231262] E[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]0-vol_home-replicate-0: Gfid mismatch detected for<2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>,9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0.Skipping conservative merge on the file.
And, above all, browsing folder I get a lot of input/ouput errors.

Currently I have 6.2M inodes and roughly 30TB in my "new" volume.
For the moment, Quota is disable to increase the IO performanceduring the back-transfert…
Your can also find in attachments:
- an "ls" result
- a split-brain research result
- the volume information and status
- a complete volume heal info
Hoping this can help your to help me to fix all my problems andreopen the computing production.
Thanks in advance,
Geoffrey

PS: « Erreur d’Entrée/Sortie » = « Input / Output Error »
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
<ls_example.txt>
<split_brain__20150725.txt>
<vol_home_healinfo.txt>
<vol_home_info.txt>
<vol_home_status.txt>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] heaps split-brains during back-transfert

Reply via email to