Now it’s more fast but it’s interesting to notice a différence (a decease) in the "du -sk" output: [root@lucifer ~]# du -sk /home/sterpone_team/ 10583360073 /home/sterpone_team/ [root@lucifer ~]# time du -sk /home/sterpone_team/ 10583360057 /home/sterpone_team/
real 21m21.068s user 0m1.766s sys 0m8.864s Do you have an idea? Best, Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: [email protected] Le 4 août 2015 à 16:53, Geoffrey Letessier <[email protected]> a écrit : > OK, after a couple of times/hours (the last message was blocked but sent > around 9h30 AM french time), here is the result: > # du -sk /home/sterpone_team/ > 10583360073 /home/sterpone_team/ > > In other words: ~9.86TB > So, as you can read below, the result is globally the same than before (with > 'du -sh’: ~9.9TB) and quite different of the gluster volume quota output (the > difference is ~0.5TB, 500-600GB) -very big difference-: > # gluster volume quota vol_home list /sterpone_team > Path Hard-limit Soft-limit Used > Available Soft-limit exceeded? Hard-limit exceeded? > --------------------------------------------------------------------------------------------------------------------------- > /sterpone_team > > Thanks, > Geoffrey > ------------------------------------------------------ > Geoffrey Letessier > Responsable informatique & ingénieur système > UPR 9080 - CNRS - Laboratoire de Biochimie Théorique > Institut de Biologie Physico-Chimique > 13, rue Pierre et Marie Curie - 75005 Paris > Tel: 01 58 41 50 93 - eMail: [email protected] > > Le 4 août 2015 à 10:26, Geoffrey Letessier <[email protected]> a > écrit : > >> Hi Vijay, >> >>> du command can round-off the values, could you check the values with 'du >>> -sk’? >> It’s ongoing. I’ll let you know the new value ASAP. >> >>> We will investigate on this issue and update you soon on the same. >> FYI it’s mainly concerning 1 brick per replicate (and thus its replicate >> brick: brick1 on storage1 and brick 1 on storage2). >> To avoid to explode my /var partition capacity, yesterday i set my >> brick-log-level parameter to CRITICAL -but now I know there is a big problem >> on several bricks.. >> >> Thanks in advance for the help and fix >> Geoffrey >> >> ------------------------------------------------------ >> Geoffrey Letessier >> Responsable informatique & ingénieur système >> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: [email protected] >> >> Le 4 août 2015 à 05:54, Vijaikumar M <[email protected]> a écrit : >> >>> Adding Raghavendra.G for RDMA issue... >>> >>> >>> Hi Geoffrey, >>> >>> Please find my comments in-line.. >>> >>> Thanks, >>> Vijay >>> >>> >>> On Monday 03 August 2015 09:15 PM, Geoffrey Letessier wrote: >>>> Hi Vijay, >>>> >>>> Yes of course, i sent my email after making some tests and checks and the >>>> result was still wrong (even after a couple of hours/1day after having >>>> forced the start of every bricks) … until i decided to do a « du » on >>>> every quota path. Now, all seems to ~OK as you can read below: >>>> # gluster volume quota vol_home list >>>> Path Hard-limit Soft-limit Used >>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>> --------------------------------------------------------------------------------------------------------------------------- >>>> /simlab_team 5.0TB 80% 1.2TB >>>> 3.8TB No No >>>> /amyloid_team 7.0TB 80% 4.9TB >>>> 2.1TB No No >>>> /amyloid_team/nguyen 3.5TB 80% 2.0TB >>>> 1.5TB No No >>>> /sacquin_team 10.0TB 80% 55.3GB >>>> 9.9TB No No >>>> /baaden_team 20.0TB 80% 11.5TB >>>> 8.5TB No No >>>> /derreumaux_team 5.0TB 80% 2.2TB >>>> 2.8TB No No >>>> /sterpone_team 14.0TB 80% 9.3TB >>>> 4.7TB No No >>>> /admin_team 1.0TB 80% 15.8GB >>>> 1008.2GB No No >>>> # for path in $(gluster volume quota vol_home list|awk 'NR>2 {print $1}'); >>>> do pdsh -w storage[1,3] "du -sh /export/brick_home/brick{1,2}/data$path"; >>>> done >>>> storage1: 219G /export/brick_home/brick1/data/simlab_team >>>> storage3: 334G /export/brick_home/brick1/data/simlab_team >>>> storage1: 307G /export/brick_home/brick2/data/simlab_team >>>> storage3: 327G /export/brick_home/brick2/data/simlab_team >>>> storage1: 1,2T /export/brick_home/brick1/data/amyloid_team >>>> storage3: 1,2T /export/brick_home/brick1/data/amyloid_team >>>> storage1: 1,2T /export/brick_home/brick2/data/amyloid_team >>>> storage3: 1,2T /export/brick_home/brick2/data/amyloid_team >>>> storage1: 505G /export/brick_home/brick1/data/amyloid_team/nguyen >>>> storage1: 483G /export/brick_home/brick2/data/amyloid_team/nguyen >>>> storage3: 508G /export/brick_home/brick1/data/amyloid_team/nguyen >>>> storage3: 503G /export/brick_home/brick2/data/amyloid_team/nguyen >>>> storage3: 16G /export/brick_home/brick1/data/sacquin_team >>>> storage1: 14G /export/brick_home/brick1/data/sacquin_team >>>> storage3: 13G /export/brick_home/brick2/data/sacquin_team >>>> storage1: 13G /export/brick_home/brick2/data/sacquin_team >>>> storage1: 3,2T /export/brick_home/brick1/data/baaden_team >>>> storage1: 2,8T /export/brick_home/brick2/data/baaden_team >>>> storage3: 2,9T /export/brick_home/brick1/data/baaden_team >>>> storage3: 2,7T /export/brick_home/brick2/data/baaden_team >>>> storage3: 588G /export/brick_home/brick1/data/derreumaux_team >>>> storage1: 566G /export/brick_home/brick1/data/derreumaux_team >>>> storage1: 563G /export/brick_home/brick2/data/derreumaux_team >>>> storage3: 610G /export/brick_home/brick2/data/derreumaux_team >>>> storage3: 2,5T /export/brick_home/brick1/data/sterpone_team >>>> storage1: 2,7T /export/brick_home/brick1/data/sterpone_team >>>> storage3: 2,4T /export/brick_home/brick2/data/sterpone_team >>>> storage1: 2,4T /export/brick_home/brick2/data/sterpone_team >>>> storage3: 519M /export/brick_home/brick1/data/admin_team >>>> storage1: 11G /export/brick_home/brick1/data/admin_team >>>> storage3: 974M /export/brick_home/brick2/data/admin_team >>>> storage1: 4,0G /export/brick_home/brick2/data/admin_team >>>> >>>> In short: >>>> simlab_team: ~1.2TB >>>> amyloid_team: ~4.8TB >>>> amyloid_team/nguyen: ~2TB >>>> sacquin_team: ~56GB >>>> baaden_team: ~11.6TB >>>> derreumaux_team: 2.3TB >>>> sterpone_team: ~10TB >>>> admin_team: ~16.5GB >>>> >>>> There’s still some difference but it’s globally quite correct (except for >>>> sterpone_team quota defined). >>>> >>>> But, I also noticed something strange: here are the result of every « du » >>>> i did to force the « recompute » of the quota size (on the glusterfs mount >>>> point): >>>> # du -sh /home/simlab_team/ >>>> 1,2T /home/simlab_team/ >>>> # du -sh /home/amyloid_team/ >>>> 4,7T /home/amyloid_team/ >>>> # du -sh /home/sacquin_team/ >>>> 56G /home/sacquin_team/ >>>> # du -sh /home/baaden_team/ >>>> 12T /home/baaden_team/ >>>> # du -sh /home/derreumaux_team/ >>>> 2,3T /home/derreumaux_team/ >>>> # du -sh /home/sterpone_team/ >>>> 9,9T /home/sterpone_team/ >>>> >>>> As you can above, I dont understand why the quota size computed by quota >>>> daemon is different than a "du", especially concerning the quota size of >>>> /sterpone_team >>>> >>> du command can round-off the values, could you check the values with 'du >>> -sk'? >>> >>> >>> >>>> Now, concerning all hangs i met, can you provide me the brand of your >>>> infiniband interconnect? From my side, we use QLogic -maybe the problem >>>> takes its origin here (Intel/Qlogic and Mellanox are quite different). >>>> >>>> >>>> Concerning the brick logs, I just noticed I have a lot of error on one of >>>> my brick logs and the file take around 5GB. Here is an extract: >>>> # tail -30l /var/log/glusterfs/bricks/export-brick_home-brick1-data.log >>>> [2015-08-03 15:32:37.408204] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.410017] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.410689] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.410860] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.412638] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.413435] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.413640] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.415325] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.416102] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.416308] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.418025] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.418799] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.419001] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.420681] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.421416] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.421607] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.423208] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.423882] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.424089] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.425863] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.426581] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.426790] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.428438] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.429133] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> [2015-08-03 15:32:37.429325] E [dict.c:1418:dict_copy_with_ref] >>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) >>>> [0x7f021c6f7410] >>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) >>>> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) >>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide] >>>> The message "W [MSGID: 120003] [quota.c:759:quota_build_ancestry_cbk] >>>> 0-vol_home-quota: parent is NULL [Argument invalide]" repeated 9016 times >>>> between [2015-08-03 15:31:55.379522] and [2015-08-03 15:32:00.997113] >>>> [2015-08-03 15:32:37.442244] I [MSGID: 115036] >>>> [server.c:545:server_rpc_notify] 0-vol_home-server: disconnecting >>>> connection from >>>> lucifer.lbt.ibpc.fr-21153-2015/08/03-15:31:23:33181-vol_home-client-0-0-0 >>>> [2015-08-03 15:32:37.442286] I [MSGID: 101055] >>>> [client_t.c:419:gf_client_unref] 0-vol_home-server: Shutting down >>>> connection >>>> lucifer.lbt.ibpc.fr-21153-2015/08/03-15:31:23:33181-vol_home-client-0-0-0 >>>> The message "E [MSGID: 113104] >>>> [posix-handle.c:154:posix_make_ancestryfromgfid] 0-vol_home-posix: could >>>> not read the link from the gfid handle >>>> /export/brick_home/brick1/data/.glusterfs/19/b6/19b67130-b409-4666-9237-2661241a8847 >>>> [Aucun fichier ou dossier de ce type]" repeated 755 times between >>>> [2015-08-03 15:31:25.553801] and [2015-08-03 15:31:43.528305] >>>> The message "E [MSGID: 113104] >>>> [posix-handle.c:154:posix_make_ancestryfromgfid] 0-vol_home-posix: could >>>> not read the link from the gfid handle >>>> /export/brick_home/brick1/data/.glusterfs/81/5a/815acde3-7f47-410b-9131-e8d75c71a5bd >>>> [Aucun fichier ou dossier de ce type]" repeated 8147 times between >>>> [2015-08-03 15:31:25.521255] and [2015-08-03 15:31:53.593932] >>>> Do you have an idea where this issue come from and what I have to do to >>>> fix it? >>> We will investigate on this issue and update you soon on the same. >>> >>> >>> >>> >>>> >>>> # grep -rc "\] E \[" >>>> /var/log/glusterfs/bricks/export-brick_home-brick{1,2}-data.log >>>> /var/log/glusterfs/bricks/export-brick_home-brick1-data.log:11038933 >>>> /var/log/glusterfs/bricks/export-brick_home-brick2-data.log:243 >>>> >>>> FYI I updated GlusterFS to the latest version (v3.7.3) 2 days ago. >>>> >>>> Thanks in advance for the next answers. and thanks for all your help (all >>>> the support team). >>>> Best, >>>> Geoffrey >>>> >>>> ------------------------------------------------------ >>>> Geoffrey Letessier >>>> Responsable informatique & ingénieur système >>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >>>> Institut de Biologie Physico-Chimique >>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>> >>>> Le 3 août 2015 à 08:51, Vijaikumar M <[email protected]> a écrit : >>>> >>>>> Hi Geoffrey, >>>>> >>>>> Please find my comments in-line. >>>>> >>>>> >>>>> On Saturday 01 August 2015 04:10 AM, Geoffrey Letessier wrote: >>>>>> Hello, >>>>>> >>>>>> As Krutika said, I resolved with success all split-brains (more than >>>>>> 3450) appeared after the first data transfert from one backup server to >>>>>> my new and fresh volume but… >>>>>> >>>>>> The following step to validate my new volume was to enable the quota on >>>>>> it; and now, more than one day after this activation, all the results >>>>>> are still completely wrong: >>>>>> Example: >>>>>> # df -h /home/sterpone_team >>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>> ib-storage1:vol_home.tcp >>>>>> 14T 3,3T 11T 24% /home >>>>>> # pdsh -w storage[1,3] du -sh >>>>>> /export/brick_home/brick{1,2}/data/sterpone_team >>>>>> storage3: 2,5T /export/brick_home/brick1/data/sterpone_team >>>>>> storage3: 2,4T /export/brick_home/brick2/data/sterpone_team >>>>>> storage1: 2,7T /export/brick_home/brick1/data/sterpone_team >>>>>> storage1: 2,4T /export/brick_home/brick2/data/sterpone_team >>>>>> As you can read, all data for this account is around 10TB and quota >>>>>> displays only 3.3TB used. >>>>>> >>>>>> Worse: >>>>>> # pdsh -w storage[1,3] du -sh >>>>>> /export/brick_home/brick{1,2}/data/baaden_team >>>>>> storage3: 2,9T /export/brick_home/brick1/data/baaden_team >>>>>> storage3: 2,7T /export/brick_home/brick2/data/baaden_team >>>>>> storage1: 3,2T /export/brick_home/brick1/data/baaden_team >>>>>> storage1: 2,8T /export/brick_home/brick2/data/baaden_team >>>>>> # df -h /home/baaden_team/ >>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>> ib-storage1:vol_home.tcp >>>>>> 20T 786G 20T 4% /home >>>>>> # gluster volume quota vol_home list /baaden_team >>>>>> Path Hard-limit Soft-limit Used >>>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>>>> --------------------------------------------------------------------------------------------------------------------------- >>>>>> /baaden_team 20.0TB 80% 785.6GB >>>>>> 19.2TB No No >>>>>> This account is around 11.6TB and quota detects only 786GB used… >>>>>> >>>>> As you mentioned below, some of the bricks were down. 'quota list' will >>>>> only show the aggregated value of online bricks, Could you please check >>>>> the 'quota list' when all the bricks are up and running? >>>>> I suspect quota initiate might not have completed because of brick down. >>>>> >>>>>> Can someone help me to fix it -knowing if I've previously updated >>>>>> GlusterFS from 3.5.3 to 3.7.2 it was exactly to solve a similar trouble… >>>>>> >>>>>> For information, in quotad log file: >>>>>> [2015-07-31 22:13:00.574361] I [MSGID: 114047] >>>>>> [client-handshake.c:1225:client_setvolume_cbk] 0-vol_home-client-7: >>>>>> Server and Client lk-version numbers are not same, reopening the fds >>>>>> [2015-07-31 22:13:00.574507] I [MSGID: 114035] >>>>>> [client-handshake.c:193:client_set_lk_version_cbk] 0-vol_home-client-7: >>>>>> Server lk version = 1 >>>>>> >>>>>> is there any causal connection (client/server version conflict)? >>>>>> >>>>>> Here what i noticed on my /var/log/glusterfs/quota-mount-vol_home.log >>>>>> file: >>>>>> … <same kind of lines> >>>>>> [2015-07-31 21:26:15.247269] I [rpc-clnt.c:1819:rpc_clnt_reconfig] >>>>>> 0-vol_home-client-5: changing port to 49162 (from 0) >>>>>> [2015-07-31 21:26:15.250272] E [socket.c:2332:socket_connect_finish] >>>>>> 0-vol_home-client-5: connection to 10.0.4.2:49162 failed (Connexion >>>>>> refusée) >>>>>> [2015-07-31 21:26:19.250545] I [rpc-clnt.c:1819:rpc_clnt_reconfig] >>>>>> 0-vol_home-client-5: changing port to 49162 (from 0) >>>>>> [2015-07-31 21:26:19.253643] E [socket.c:2332:socket_connect_finish] >>>>>> 0-vol_home-client-5: connection to 10.0.4.2:49162 failed (Connexion >>>>>> refusée) >>>>>> … <same kind of lines> >>>>>> >>>>> Connection refused is because brick is down. >>>>> >>>>>> <A few minutes after:> OK, this was due to one brick which was down. >>>>>> It’s strange: since I have updated GlusteFS to 3.7.x I notice a lot of >>>>>> bricks which go down, sometimes a few moment after starting the volume, >>>>>> sometime after a couple of days/weeks… What never happened with >>>>>> GlusterFS version 3.3.1 and 3.5.3. >>>>>> >>>>> Could please provide brick log? We will check the log on this issue, once >>>>> this issue is fixed, we can initiate quota healing again. >>>>> >>>>> >>>>>> Now, I need to stop-and-start the volume because I notice again some >>>>>> hangs with "gluster volume quota … ", "df", etc. One more time, i’ve >>>>>> never noticed this kind of hangs with previous versions of GlusterFS I >>>>>> used; is it "expected"? >>>>> >>>>> From you previous mail we tried re-creating hang problem, however it was >>>>> not re-creating. >>>>> >>>>> >>>>> >>>>>> One more time: thank you very much by advance. >>>>>> Geoffrey >>>>>> >>>>>> ------------------------------------------------------ >>>>>> Geoffrey Letessier >>>>>> Responsable informatique & ingénieur système >>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >>>>>> Institut de Biologie Physico-Chimique >>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>>>> >>>>>> Le 31 juil. 2015 à 11:26, Niels de Vos <[email protected]> a écrit : >>>>>> >>>>>>> On Wed, Jul 29, 2015 at 12:44:38AM +0200, Geoffrey Letessier wrote: >>>>>>>> OK, thank you Niels for this explanation. Now, this makes sense. >>>>>>>> >>>>>>>> And concerning all split-brains appeared during the back-transfert, do >>>>>>>> you have an idea where is this coming from? >>>>>>> >>>>>>> Sorry, no, I dont know how that is happening in your environment. I'll >>>>>>> try to find someone that understands more about it and can help you with >>>>>>> that. >>>>>>> >>>>>>> Niels >>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> Geoffrey >>>>>>>> ------------------------------------------------------ >>>>>>>> Geoffrey Letessier >>>>>>>> Responsable informatique & ingénieur système >>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>>>>>> >>>>>>>> Le 29 juil. 2015 à 00:02, Niels de Vos <[email protected]> a écrit : >>>>>>>> >>>>>>>>> On Tue, Jul 28, 2015 at 03:46:37PM +0200, Geoffrey Letessier wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> In addition of all split brains reported, is it normal to notice >>>>>>>>>> thousands and thousands (several tens nay hundreds of thousands) >>>>>>>>>> broken symlinks browsing the .glusterfs directory on each brick? >>>>>>>>> >>>>>>>>> Yes, I think it is normal. A symlink points to a particular filename, >>>>>>>>> possibly in a different directory. If the target file is located on a >>>>>>>>> different brick, the symlink points to a non-local file. >>>>>>>>> >>>>>>>>> Consider this example with two bricks in a distributed volume: >>>>>>>>> - file: README >>>>>>>>> - symlink: IMPORTANT -> README >>>>>>>>> >>>>>>>>> When the distribution algorithm is done, README 'hashes' to brick-A. >>>>>>>>> The >>>>>>>>> symlink 'hashes' to brick-B. This means that README will be localed on >>>>>>>>> brick-A, and the symlink with name IMPORTANT would be located on >>>>>>>>> brick-B. Because README is not on the same brick as IMPORTANT, the >>>>>>>>> symlink points to the non-existing file README on brick-B. >>>>>>>>> >>>>>>>>> However, when a Gluster client reads the target of symlink IMPORTANT, >>>>>>>>> the Gluster client calculate the location of README and will know that >>>>>>>>> README can be found on brick-A. >>>>>>>>> >>>>>>>>> I hope that makes sense? >>>>>>>>> >>>>>>>>> Niels >>>>>>>>> >>>>>>>>> >>>>>>>>>> For the moment, i just synchronized one remote directory (around 30TB >>>>>>>>>> and a few million files) into my new volume. No other operations on >>>>>>>>>> files on this volume has yet been done. >>>>>>>>>> How can I fix it? Can I delete these dead-symlinks? How can I fix all >>>>>>>>>> my split-brains? >>>>>>>>>> >>>>>>>>>> Here is an example of a ls: >>>>>>>>>> [root@cl-storage3 ~]# cd >>>>>>>>>> /export/brick_home/brick1/data/.glusterfs/7b/d2/ >>>>>>>>>> [root@cl-storage3 d2]# ll >>>>>>>>>> total 8,7M >>>>>>>>>> 13706 drwx------ 2 root root 8,0K 26 juil. >>>>>>>>>> 17:22 . >>>>>>>>>> 2147483784 drwx------ 258 root root 8,0K 20 juil. >>>>>>>>>> 23:07 .. >>>>>>>>>> 2148444137 -rwxrwxrwx 2 baaden baaden_team 173K 22 mai >>>>>>>>>> 2008 7bd200dd-1774-4395-9065-605ae30ec18b >>>>>>>>>> 1559384 -rw-rw-r-- 2 tarus amyloid_team 4,3K 19 juin >>>>>>>>>> 2013 7bd2155c-7a05-4edc-ae77-35ed7e16afbc >>>>>>>>>> 287295 lrwxrwxrwx 1 root root 58 20 juil. >>>>>>>>>> 23:38 7bd2370a-100b-411e-89a4-d184da9f0f88 -> >>>>>>>>>> ../../a7/59/a759de6f-cdf5-43dd-809a-baf81d103bf7/prop-base >>>>>>>>>> 2149090201 -rw-rw-r-- 2 tarus amyloid_team 76K 8 mars >>>>>>>>>> 2014 7bd2497f-d24b-4b19-a1c5-80a4956e56a1 >>>>>>>>>> 2148561174 -rw-r--r-- 2 tran derreumaux_team 575 14 févr. >>>>>>>>>> 07:54 7bd25db0-67f5-43e5-a56a-52cf8c4c60dd >>>>>>>>>> 1303943 -rw-r--r-- 2 tran derreumaux_team 576 10 févr. >>>>>>>>>> 06:06 7bd25e97-18be-4faf-b122-5868582b4fd8 >>>>>>>>>> 1308607 -rw-r--r-- 2 tran derreumaux_team 414K 16 juin >>>>>>>>>> 11:05 7bd2618f-950a-4365-a753-723597ef29f5 >>>>>>>>>> 45745 -rw-r--r-- 2 letessier admin_team 585 5 janv. >>>>>>>>>> 2012 7bd265c7-e204-4ee8-8717-e4a0c393fb0f >>>>>>>>>> 2148144918 -rw-rw-r-- 2 tarus amyloid_team 107K 28 févr. >>>>>>>>>> 2014 7bd26c5b-d48a-481a-9ca6-2dc27768b5ad >>>>>>>>>> 13705 -rw-rw-r-- 2 tarus amyloid_team 25K 4 juin >>>>>>>>>> 2014 7bd27e4c-46ba-4f21-a766-389bfa52fd78 >>>>>>>>>> 1633627 -rw-rw-r-- 2 tarus amyloid_team 75K 12 mars >>>>>>>>>> 2014 7bd28631-90af-4c16-8ff0-c3d46d5026c6 >>>>>>>>>> 1329165 -rw-r--r-- 2 tran derreumaux_team 175 15 juin >>>>>>>>>> 23:40 7bd2957e-a239-4110-b3d8-b4926c7f060b >>>>>>>>>> 797803 lrwxrwxrwx 2 baaden baaden_team 26 2 avril >>>>>>>>>> 2007 7bd29933-1c80-4c6b-ae48-e64e4da874cb -> >>>>>>>>>> ../divided/a7/2a7o.pdb1.gz >>>>>>>>>> 1532463 -rw-rw-rw- 2 baaden baaden_team 1,8M 2 nov. >>>>>>>>>> 2009 7bd29d70-aeb4-4eca-ac55-fae2d46ba911 >>>>>>>>>> 1411112 -rw-r--r-- 2 sterpone sterpone_team 3,1K 2 mai >>>>>>>>>> 2012 7bd2a5eb-62a4-47fc-b149-31e10bd3c33d >>>>>>>>>> 2148865896 -rw-r--r-- 2 tran derreumaux_team 2,1M 15 juin >>>>>>>>>> 23:46 7bd2ae9c-18ca-471f-a54a-6e4aec5aea89 >>>>>>>>>> 2148762578 -rw-rw-r-- 2 tarus amyloid_team 154K 11 mars >>>>>>>>>> 2014 7bd2b7d7-7745-4842-b7b4-400791c1d149 >>>>>>>>>> 149216 -rw-r--r-- 2 vamparys sacquin_team 241K 17 mai >>>>>>>>>> 2013 7bd2ba98-6a42-40ea-87ea-acb607d73cb5 >>>>>>>>>> 2148977923 -rwxr-xr-x 2 murail baaden_team 23K 18 juin >>>>>>>>>> 2012 7bd2cf57-19e7-451c-885d-fd02fd988d43 >>>>>>>>>> 1176623 -rw-rw-r-- 2 tarus amyloid_team 227K 8 mars >>>>>>>>>> 2014 7bd2d92c-7ec8-4af8-9043-49d1908a99dc >>>>>>>>>> 1172122 lrwxrwxrwx 2 sterpone sterpone_team 61 17 avril >>>>>>>>>> 12:49 7bd2d96e-e925-45f0-a26a-56b95c084122 -> >>>>>>>>>> ../../../../../src/libs/ck-libs/ParFUM-Tops-Dev/ParFUM_TOPS.h >>>>>>>>>> 1385933 -rw-r--r-- 2 tran derreumaux_team 2,9M 16 juin >>>>>>>>>> 05:29 7bd2df54-17d2-4644-96b7-f8925a67ec1e >>>>>>>>>> 745899 lrwxrwxrwx 1 root root 58 22 juil. >>>>>>>>>> 09:50 7bd2df83-ce58-4a17-aca8-a32b71e953d4 -> >>>>>>>>>> ../../5c/39/5c39010f-fa77-49df-8df6-8d72cf74fd64/model_009 >>>>>>>>>> 2149100186 -rw-rw-r-- 2 tarus amyloid_team 494K 17 mars >>>>>>>>>> 2014 7bd2e865-a2f4-4d90-ab29-dccebe2e3440 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best. >>>>>>>>>> Geoffrey >>>>>>>>>> ------------------------------------------------------ >>>>>>>>>> Geoffrey Letessier >>>>>>>>>> Responsable informatique & ingénieur système >>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>>>>>>>> >>>>>>>>>> Le 27 juil. 2015 à 22:57, Geoffrey Letessier >>>>>>>>>> <[email protected]> a écrit : >>>>>>>>>> >>>>>>>>>>> Dears, >>>>>>>>>>> >>>>>>>>>>> For a couple of weeks (more than one month), our computing >>>>>>>>>>> production is stopped due to several -but amazing- troubles with >>>>>>>>>>> GlusterFS. >>>>>>>>>>> >>>>>>>>>>> After having noticed a big problem with incorrect quota size >>>>>>>>>>> accounted for many many files, i decided under the guidance of >>>>>>>>>>> Gluster team support to upgrade my storage cluster from version >>>>>>>>>>> 3.5.3 to the latest (3.7.2-3) because these bugs are theoretically >>>>>>>>>>> fixed in this branch. Now, since i’ve done this upgrade, it’s the >>>>>>>>>>> amazing mess and i cannot restart the production. >>>>>>>>>>> Indeed : >>>>>>>>>>> 1 - RDMA protocol is not working and hang my system / shell >>>>>>>>>>> commands; only TCP protocol (over Infiniband) is more or less >>>>>>>>>>> operational - it’s not a blocking point but… >>>>>>>>>>> 2 - read/write performance relatively low >>>>>>>>>>> 3 - thousands split-brains are appeared. >>>>>>>>>>> >>>>>>>>>>> So, for the moment, i believe GlusterFS 3.7 is not actually >>>>>>>>>>> production ready. >>>>>>>>>>> >>>>>>>>>>> Concerning the third point: after having destroy all my volumes >>>>>>>>>>> (RAID re-init, new partition, GlusterFS volumes, etc.), recreate >>>>>>>>>>> the main one, I tried to back-transfert my data from archive/backup >>>>>>>>>>> server info this new volume and I note a lot of errors in my mount >>>>>>>>>>> log file, as your can read in this extract: >>>>>>>>>>> [2015-07-26 22:35:16.962815] I >>>>>>>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] >>>>>>>>>>> 0-vol_home-replicate-0: performing entry selfheal on >>>>>>>>>>> 865083fa-984e-44bd-aacf-b8195789d9e0 >>>>>>>>>>> [2015-07-26 22:35:16.965896] E >>>>>>>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] >>>>>>>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for >>>>>>>>>>> <865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>, >>>>>>>>>>> e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and >>>>>>>>>>> 820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping >>>>>>>>>>> conservative merge on the file. >>>>>>>>>>> [2015-07-26 22:35:16.975206] I >>>>>>>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] >>>>>>>>>>> 0-vol_home-replicate-0: performing entry selfheal on >>>>>>>>>>> 29382d8d-c507-4d2e-b74d-dbdcb791ca65 >>>>>>>>>>> [2015-07-26 22:35:28.719935] E >>>>>>>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] >>>>>>>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for >>>>>>>>>>> <29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>, >>>>>>>>>>> 951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and >>>>>>>>>>> 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping >>>>>>>>>>> conservative merge on the file. >>>>>>>>>>> [2015-07-26 22:35:29.764891] I >>>>>>>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] >>>>>>>>>>> 0-vol_home-replicate-0: performing entry selfheal on >>>>>>>>>>> 865083fa-984e-44bd-aacf-b8195789d9e0 >>>>>>>>>>> [2015-07-26 22:35:29.768339] E >>>>>>>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] >>>>>>>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for >>>>>>>>>>> <865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>, >>>>>>>>>>> e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and >>>>>>>>>>> 820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping >>>>>>>>>>> conservative merge on the file. >>>>>>>>>>> [2015-07-26 22:35:29.775037] I >>>>>>>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] >>>>>>>>>>> 0-vol_home-replicate-0: performing entry selfheal on >>>>>>>>>>> 29382d8d-c507-4d2e-b74d-dbdcb791ca65 >>>>>>>>>>> [2015-07-26 22:35:29.776857] E >>>>>>>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] >>>>>>>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for >>>>>>>>>>> <29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>, >>>>>>>>>>> 951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and >>>>>>>>>>> 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping >>>>>>>>>>> conservative merge on the file. >>>>>>>>>>> [2015-07-26 22:35:29.800535] W [MSGID: 108008] >>>>>>>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] >>>>>>>>>>> 0-vol_home-replicate-0: GFID mismatch for >>>>>>>>>>> <gfid:29382d8d-c507-4d2e-b74d-dbdcb791ca65>/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt >>>>>>>>>>> 951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and >>>>>>>>>>> 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0 >>>>>>>>>>> >>>>>>>>>>> And when I try to browse some folders (still in mount log file): >>>>>>>>>>> [2015-07-27 09:00:19.005763] I >>>>>>>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] >>>>>>>>>>> 0-vol_home-replicate-0: performing entry selfheal on >>>>>>>>>>> 2ac27442-8be0-4985-b48f-3328a86a6686 >>>>>>>>>>> [2015-07-27 09:00:22.322316] E >>>>>>>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] >>>>>>>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for >>>>>>>>>>> <2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>, >>>>>>>>>>> 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and >>>>>>>>>>> 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping >>>>>>>>>>> conservative merge on the file. >>>>>>>>>>> [2015-07-27 09:00:23.008771] I >>>>>>>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] >>>>>>>>>>> 0-vol_home-replicate-0: performing entry selfheal on >>>>>>>>>>> 2ac27442-8be0-4985-b48f-3328a86a6686 >>>>>>>>>>> [2015-07-27 08:59:50.359187] W [MSGID: 108008] >>>>>>>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] >>>>>>>>>>> 0-vol_home-replicate-0: GFID mismatch for >>>>>>>>>>> <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012588.gro >>>>>>>>>>> 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and >>>>>>>>>>> 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0 >>>>>>>>>>> [2015-07-27 09:00:02.500419] W [MSGID: 108008] >>>>>>>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] >>>>>>>>>>> 0-vol_home-replicate-0: GFID mismatch for >>>>>>>>>>> <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012590.gro >>>>>>>>>>> b22aec09-2be3-41ea-a976-7b8d0e6f61f0 on vol_home-client-1 and >>>>>>>>>>> ec100f9e-ec48-4b29-b75e-a50ec6245de6 on vol_home-client-0 >>>>>>>>>>> [2015-07-27 09:00:02.506925] W [MSGID: 108008] >>>>>>>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] >>>>>>>>>>> 0-vol_home-replicate-0: GFID mismatch for >>>>>>>>>>> <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0009059.gro >>>>>>>>>>> 0485c093-11ca-4829-b705-e259668ebd8c on vol_home-client-1 and >>>>>>>>>>> e83a492b-7f8c-4b32-a76e-343f984142fe on vol_home-client-0 >>>>>>>>>>> [2015-07-27 09:00:23.001121] W [MSGID: 108008] >>>>>>>>>>> [afr-read-txn.c:241:afr_read_txn] 0-vol_home-replicate-0: >>>>>>>>>>> Unreadable subvolume -1 found with event generation 2. (Possible >>>>>>>>>>> split-brain) >>>>>>>>>>> [2015-07-27 09:00:26.231262] E >>>>>>>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] >>>>>>>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for >>>>>>>>>>> <2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>, >>>>>>>>>>> 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and >>>>>>>>>>> 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping >>>>>>>>>>> conservative merge on the file. >>>>>>>>>>> >>>>>>>>>>> And, above all, browsing folder I get a lot of input/ouput errors. >>>>>>>>>>> >>>>>>>>>>> Currently I have 6.2M inodes and roughly 30TB in my "new" volume. >>>>>>>>>>> >>>>>>>>>>> For the moment, Quota is disable to increase the IO performance >>>>>>>>>>> during the back-transfert… >>>>>>>>>>> >>>>>>>>>>> Your can also find in attachments: >>>>>>>>>>> - an "ls" result >>>>>>>>>>> - a split-brain research result >>>>>>>>>>> - the volume information and status >>>>>>>>>>> - a complete volume heal info >>>>>>>>>>> >>>>>>>>>>> Hoping this can help your to help me to fix all my problems and >>>>>>>>>>> reopen the computing production. >>>>>>>>>>> >>>>>>>>>>> Thanks in advance, >>>>>>>>>>> Geoffrey >>>>>>>>>>> >>>>>>>>>>> PS: « Erreur d’Entrée/Sortie » = « Input / Output Error » >>>>>>>>>>> ------------------------------------------------------ >>>>>>>>>>> Geoffrey Letessier >>>>>>>>>>> Responsable informatique & ingénieur système >>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique >>>>>>>>>>> Institut de Biologie Physico-Chimique >>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris >>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: [email protected] >>>>>>>>>>> >>>>>>>>>>> <ls_example.txt> >>>>>>>>>>> <split_brain__20150725.txt> >>>>>>>>>>> <vol_home_healinfo.txt> >>>>>>>>>>> <vol_home_info.txt> >>>>>>>>>>> <vol_home_status.txt> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
