Re: [Gluster-users] heaps split-brains during back-transfert

Geoffrey Letessier Fri, 31 Jul 2015 15:41:47 -0700

Hello,

As Krutika said, I resolved with success all split-brains (more than 3450) 
appeared after the first data transfert from one backup server to my new and 
fresh volume but…


The following step to validate my new volume was to enable the quota on it; and 
now, more than one day after this activation, all the results are still 
completely wrong:
Example:
# df -h /home/sterpone_team
Filesystem            Size  Used Avail Use% Mounted on
ib-storage1:vol_home.tcp
                       14T  3,3T   11T  24% /home
# pdsh -w storage[1,3] du -sh /export/brick_home/brick{1,2}/data/sterpone_team
storage3: 2,5T  /export/brick_home/brick1/data/sterpone_team
storage3: 2,4T  /export/brick_home/brick2/data/sterpone_team
storage1: 2,7T  /export/brick_home/brick1/data/sterpone_team
storage1: 2,4T  /export/brick_home/brick2/data/sterpone_team
As you can read, all data for this account is around 10TB and quota displays 
only 3.3TB used.

Worse:
# pdsh -w storage[1,3] du -sh /export/brick_home/brick{1,2}/data/baaden_team
storage3: 2,9T  /export/brick_home/brick1/data/baaden_team
storage3: 2,7T  /export/brick_home/brick2/data/baaden_team
storage1: 3,2T  /export/brick_home/brick1/data/baaden_team
storage1: 2,8T  /export/brick_home/brick2/data/baaden_team
# df -h /home/baaden_team/
Filesystem            Size  Used Avail Use% Mounted on
ib-storage1:vol_home.tcp
                       20T  786G   20T   4% /home
# gluster volume quota vol_home list /baaden_team
                  Path                   Hard-limit Soft-limit   Used  
Available  Soft-limit exceeded? Hard-limit exceeded?
---------------------------------------------------------------------------------------------------------------------------
/baaden_team                              20.0TB       80%     785.6GB  19.2TB  
            No                   No
This account is around 11.6TB and quota detects only 786GB used…

Can someone help me to fix it -knowing if I've previously updated GlusterFS 
from 3.5.3 to 3.7.2 it was exactly to solve a similar trouble… 

For information, in quotad log file:
[2015-07-31 22:13:00.574361] I [MSGID: 114047] 
[client-handshake.c:1225:client_setvolume_cbk] 0-vol_home-client-7: Server and 
Client lk-version numbers are not same, reopening the fds
[2015-07-31 22:13:00.574507] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 0-vol_home-client-7: Server 
lk version = 1

is there any causal connection (client/server version conflict)?

Here what i noticed on my /var/log/glusterfs/quota-mount-vol_home.log file:
… <same kind of lines>
[2015-07-31 21:26:15.247269] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 
0-vol_home-client-5: changing port to 49162 (from 0)
[2015-07-31 21:26:15.250272] E [socket.c:2332:socket_connect_finish] 
0-vol_home-client-5: connection to 10.0.4.2:49162 failed (Connexion refusée)
[2015-07-31 21:26:19.250545] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 
0-vol_home-client-5: changing port to 49162 (from 0)
[2015-07-31 21:26:19.253643] E [socket.c:2332:socket_connect_finish] 
0-vol_home-client-5: connection to 10.0.4.2:49162 failed (Connexion refusée)
… <same kind of lines>

<A few minutes after:> OK, this was due to one brick which was down. It’s 
strange: since I have updated GlusteFS to 3.7.x I notice a lot of bricks which 
go down, sometimes a few moment after starting the volume, sometime after a 
couple of days/weeks… What never happened with GlusterFS version 3.3.1 and 
3.5.3.

Now, I need to stop-and-start the volume because I notice again some hangs with 
"gluster volume quota … ", "df", etc. One more time, i’ve never noticed this 
kind of hangs with previous versions of GlusterFS I used; is it "expected"?

One more time: thank you very much by advance.
Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]

Le 31 juil. 2015 à 11:26, Niels de Vos <[email protected]> a écrit :

> On Wed, Jul 29, 2015 at 12:44:38AM +0200, Geoffrey Letessier wrote:
>> OK, thank you Niels for this explanation. Now, this makes sense.
>> 
>> And concerning all split-brains appeared during the back-transfert, do you 
>> have an idea where is this coming from?
> 
> Sorry, no, I dont know how that is happening in your environment. I'll
> try to find someone that understands more about it and can help you with
> that.
> 
> Niels
> 
>> 
>> Best,
>> Geoffrey
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ingénieur système
>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: [email protected]
>> 
>> Le 29 juil. 2015 à 00:02, Niels de Vos <[email protected]> a écrit :
>> 
>>> On Tue, Jul 28, 2015 at 03:46:37PM +0200, Geoffrey Letessier wrote:
>>>> Hi,
>>>> 
>>>> In addition of all split brains reported, is it normal to notice
>>>> thousands and thousands (several tens nay hundreds of thousands)
>>>> broken symlinks browsing the .glusterfs directory on each brick? 
>>> 
>>> Yes, I think it is normal. A symlink points to a particular filename,
>>> possibly in a different directory. If the target file is located on a
>>> different brick, the symlink points to a non-local file.
>>> 
>>> Consider this example with two bricks in a distributed volume:
>>> - file: README
>>> - symlink: IMPORTANT -> README
>>> 
>>> When the distribution algorithm is done, README 'hashes' to brick-A. The
>>> symlink 'hashes' to brick-B. This means that README will be localed on
>>> brick-A, and the symlink with name IMPORTANT would be located on
>>> brick-B. Because README is not on the same brick as IMPORTANT, the
>>> symlink points to the non-existing file README on brick-B.
>>> 
>>> However, when a Gluster client reads the target of symlink IMPORTANT,
>>> the Gluster client calculate the location of README and will know that
>>> README can be found on brick-A.
>>> 
>>> I hope that makes sense?
>>> 
>>> Niels
>>> 
>>> 
>>>> For the moment, i just synchronized one remote directory (around 30TB
>>>> and a few million files) into my new volume. No other operations on
>>>> files on this volume has yet been done.
>>>> How can I fix it? Can I delete these dead-symlinks? How can I fix all
>>>> my split-brains? 
>>>> 
>>>> Here is an example of a ls:
>>>> [root@cl-storage3 ~]# cd /export/brick_home/brick1/data/.glusterfs/7b/d2/
>>>> [root@cl-storage3 d2]# ll
>>>> total 8,7M
>>>>    13706 drwx------   2 root      root            8,0K 26 juil. 17:22 .
>>>> 2147483784 drwx------ 258 root      root            8,0K 20 juil. 23:07 ..
>>>> 2148444137 -rwxrwxrwx   2 baaden    baaden_team     173K 22 mai    2008 
>>>> 7bd200dd-1774-4395-9065-605ae30ec18b
>>>>  1559384 -rw-rw-r--   2 tarus     amyloid_team    4,3K 19 juin   2013 
>>>> 7bd2155c-7a05-4edc-ae77-35ed7e16afbc
>>>>   287295 lrwxrwxrwx   1 root      root              58 20 juil. 23:38 
>>>> 7bd2370a-100b-411e-89a4-d184da9f0f88 -> 
>>>> ../../a7/59/a759de6f-cdf5-43dd-809a-baf81d103bf7/prop-base
>>>> 2149090201 -rw-rw-r--   2 tarus     amyloid_team     76K  8 mars   2014 
>>>> 7bd2497f-d24b-4b19-a1c5-80a4956e56a1
>>>> 2148561174 -rw-r--r--   2 tran      derreumaux_team  575 14 févr. 07:54 
>>>> 7bd25db0-67f5-43e5-a56a-52cf8c4c60dd
>>>>  1303943 -rw-r--r--   2 tran      derreumaux_team  576 10 févr. 06:06 
>>>> 7bd25e97-18be-4faf-b122-5868582b4fd8
>>>>  1308607 -rw-r--r--   2 tran      derreumaux_team 414K 16 juin  11:05 
>>>> 7bd2618f-950a-4365-a753-723597ef29f5
>>>>    45745 -rw-r--r--   2 letessier admin_team       585  5 janv.  2012 
>>>> 7bd265c7-e204-4ee8-8717-e4a0c393fb0f
>>>> 2148144918 -rw-rw-r--   2 tarus     amyloid_team    107K 28 févr.  2014 
>>>> 7bd26c5b-d48a-481a-9ca6-2dc27768b5ad
>>>>    13705 -rw-rw-r--   2 tarus     amyloid_team     25K  4 juin   2014 
>>>> 7bd27e4c-46ba-4f21-a766-389bfa52fd78
>>>>  1633627 -rw-rw-r--   2 tarus     amyloid_team     75K 12 mars   2014 
>>>> 7bd28631-90af-4c16-8ff0-c3d46d5026c6
>>>>  1329165 -rw-r--r--   2 tran      derreumaux_team  175 15 juin  23:40 
>>>> 7bd2957e-a239-4110-b3d8-b4926c7f060b
>>>>   797803 lrwxrwxrwx   2 baaden    baaden_team       26  2 avril  2007 
>>>> 7bd29933-1c80-4c6b-ae48-e64e4da874cb -> ../divided/a7/2a7o.pdb1.gz
>>>>  1532463 -rw-rw-rw-   2 baaden    baaden_team     1,8M  2 nov.   2009 
>>>> 7bd29d70-aeb4-4eca-ac55-fae2d46ba911
>>>>  1411112 -rw-r--r--   2 sterpone  sterpone_team   3,1K  2 mai    2012 
>>>> 7bd2a5eb-62a4-47fc-b149-31e10bd3c33d
>>>> 2148865896 -rw-r--r--   2 tran      derreumaux_team 2,1M 15 juin  23:46 
>>>> 7bd2ae9c-18ca-471f-a54a-6e4aec5aea89
>>>> 2148762578 -rw-rw-r--   2 tarus     amyloid_team    154K 11 mars   2014 
>>>> 7bd2b7d7-7745-4842-b7b4-400791c1d149
>>>>   149216 -rw-r--r--   2 vamparys  sacquin_team    241K 17 mai    2013 
>>>> 7bd2ba98-6a42-40ea-87ea-acb607d73cb5
>>>> 2148977923 -rwxr-xr-x   2 murail    baaden_team      23K 18 juin   2012 
>>>> 7bd2cf57-19e7-451c-885d-fd02fd988d43
>>>>  1176623 -rw-rw-r--   2 tarus     amyloid_team    227K  8 mars   2014 
>>>> 7bd2d92c-7ec8-4af8-9043-49d1908a99dc
>>>>  1172122 lrwxrwxrwx   2 sterpone  sterpone_team     61 17 avril 12:49 
>>>> 7bd2d96e-e925-45f0-a26a-56b95c084122 -> 
>>>> ../../../../../src/libs/ck-libs/ParFUM-Tops-Dev/ParFUM_TOPS.h
>>>>  1385933 -rw-r--r--   2 tran      derreumaux_team 2,9M 16 juin  05:29 
>>>> 7bd2df54-17d2-4644-96b7-f8925a67ec1e
>>>>   745899 lrwxrwxrwx   1 root      root              58 22 juil. 09:50 
>>>> 7bd2df83-ce58-4a17-aca8-a32b71e953d4 -> 
>>>> ../../5c/39/5c39010f-fa77-49df-8df6-8d72cf74fd64/model_009
>>>> 2149100186 -rw-rw-r--   2 tarus     amyloid_team    494K 17 mars   2014 
>>>> 7bd2e865-a2f4-4d90-ab29-dccebe2e3440
>>>> 
>>>> 
>>>> 
>>>> Best.
>>>> Geoffrey
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ingénieur système
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>> 
>>>> Le 27 juil. 2015 à 22:57, Geoffrey Letessier <[email protected]> 
>>>> a écrit :
>>>> 
>>>>> Dears,
>>>>> 
>>>>> For a couple of weeks (more than one month), our computing production is 
>>>>> stopped due to several -but amazing- troubles with GlusterFS. 
>>>>> 
>>>>> After having noticed a big problem with incorrect quota size accounted 
>>>>> for many many files, i decided under the guidance of Gluster team support 
>>>>> to upgrade my storage cluster from version 3.5.3 to the latest (3.7.2-3) 
>>>>> because these bugs are theoretically fixed in this branch. Now, since 
>>>>> i’ve done this upgrade, it’s the amazing mess and i cannot restart the 
>>>>> production.
>>>>> Indeed :
>>>>>   1 - RDMA protocol is not working and hang my system / shell commands; 
>>>>> only TCP protocol (over Infiniband) is more or less operational   - it’s 
>>>>> not a blocking point but… 
>>>>>   2 - read/write performance relatively low
>>>>>   3 - thousands split-brains are appeared.
>>>>> 
>>>>> So, for the moment, i believe GlusterFS 3.7 is not actually production 
>>>>> ready. 
>>>>> 
>>>>> Concerning the third point: after having destroy all my volumes (RAID 
>>>>> re-init, new partition, GlusterFS volumes, etc.), recreate the main one, 
>>>>> I tried to back-transfert my data from archive/backup server info this 
>>>>> new volume and I note a lot of errors in my mount log file, as your can 
>>>>> read in this extract:
>>>>> [2015-07-26 22:35:16.962815] I 
>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: 
>>>>> performing entry selfheal on 865083fa-984e-44bd-aacf-b8195789d9e0
>>>>> [2015-07-26 22:35:16.965896] E 
>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 
>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for 
>>>>> <865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>, 
>>>>> e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and 
>>>>> 820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping 
>>>>> conservative merge on the file.
>>>>> [2015-07-26 22:35:16.975206] I 
>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: 
>>>>> performing entry selfheal on 29382d8d-c507-4d2e-b74d-dbdcb791ca65
>>>>> [2015-07-26 22:35:28.719935] E 
>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 
>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for 
>>>>> <29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,
>>>>>  951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and 
>>>>> 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping 
>>>>> conservative merge on the file.
>>>>> [2015-07-26 22:35:29.764891] I 
>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: 
>>>>> performing entry selfheal on 865083fa-984e-44bd-aacf-b8195789d9e0
>>>>> [2015-07-26 22:35:29.768339] E 
>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 
>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for 
>>>>> <865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>, 
>>>>> e944d444-66c5-40a4-9603-7c190ad86013 on vol_home-client-1 and 
>>>>> 820f9bcc-a0f6-40e0-bcec-28a76b4195ea on vol_home-client-0. Skipping 
>>>>> conservative merge on the file.
>>>>> [2015-07-26 22:35:29.775037] I 
>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: 
>>>>> performing entry selfheal on 29382d8d-c507-4d2e-b74d-dbdcb791ca65
>>>>> [2015-07-26 22:35:29.776857] E 
>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 
>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for 
>>>>> <29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,
>>>>>  951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and 
>>>>> 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0. Skipping 
>>>>> conservative merge on the file.
>>>>> [2015-07-26 22:35:29.800535] W [MSGID: 108008] 
>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 
>>>>> 0-vol_home-replicate-0: GFID mismatch for 
>>>>> <gfid:29382d8d-c507-4d2e-b74d-dbdcb791ca65>/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt
>>>>>  951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on vol_home-client-1 and 
>>>>> 5ae663ca-e896-4b92-8ec5-5b15422ab861 on vol_home-client-0
>>>>> 
>>>>> And when I try to browse some folders (still in mount log file):
>>>>> [2015-07-27 09:00:19.005763] I 
>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: 
>>>>> performing entry selfheal on 2ac27442-8be0-4985-b48f-3328a86a6686
>>>>> [2015-07-27 09:00:22.322316] E 
>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 
>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for 
>>>>> <2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>, 
>>>>> 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and 
>>>>> 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping 
>>>>> conservative merge on the file.
>>>>> [2015-07-27 09:00:23.008771] I 
>>>>> [afr-self-heal-entry.c:565:afr_selfheal_entry_do] 0-vol_home-replicate-0: 
>>>>> performing entry selfheal on 2ac27442-8be0-4985-b48f-3328a86a6686
>>>>> [2015-07-27 08:59:50.359187] W [MSGID: 108008] 
>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 
>>>>> 0-vol_home-replicate-0: GFID mismatch for 
>>>>> <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012588.gro 
>>>>> 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and 
>>>>> 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0
>>>>> [2015-07-27 09:00:02.500419] W [MSGID: 108008] 
>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 
>>>>> 0-vol_home-replicate-0: GFID mismatch for 
>>>>> <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012590.gro 
>>>>> b22aec09-2be3-41ea-a976-7b8d0e6f61f0 on vol_home-client-1 and 
>>>>> ec100f9e-ec48-4b29-b75e-a50ec6245de6 on vol_home-client-0
>>>>> [2015-07-27 09:00:02.506925] W [MSGID: 108008] 
>>>>> [afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check] 
>>>>> 0-vol_home-replicate-0: GFID mismatch for 
>>>>> <gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0009059.gro 
>>>>> 0485c093-11ca-4829-b705-e259668ebd8c on vol_home-client-1 and 
>>>>> e83a492b-7f8c-4b32-a76e-343f984142fe on vol_home-client-0
>>>>> [2015-07-27 09:00:23.001121] W [MSGID: 108008] 
>>>>> [afr-read-txn.c:241:afr_read_txn] 0-vol_home-replicate-0: Unreadable 
>>>>> subvolume -1 found with event generation 2. (Possible split-brain)
>>>>> [2015-07-27 09:00:26.231262] E 
>>>>> [afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch] 
>>>>> 0-vol_home-replicate-0: Gfid mismatch detected for 
>>>>> <2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>, 
>>>>> 9c635868-054b-4a13-b974-0ba562991586 on vol_home-client-1 and 
>>>>> 1943175c-b336-4b33-aa1c-74a1c51f17b9 on vol_home-client-0. Skipping 
>>>>> conservative merge on the file.
>>>>> 
>>>>> And, above all, browsing folder I get a lot of input/ouput errors.
>>>>> 
>>>>> Currently I have 6.2M inodes and roughly 30TB in my "new" volume.
>>>>> 
>>>>> For the moment, Quota is disable to increase the IO performance during 
>>>>> the back-transfert… 
>>>>> 
>>>>> Your can also find in attachments:
>>>>>   - an "ls" result
>>>>>   - a split-brain research result
>>>>>   - the volume information and status
>>>>>   - a complete volume heal info
>>>>> 
>>>>> Hoping this can help your to help me to fix all my problems and reopen 
>>>>> the computing production.
>>>>> 
>>>>> Thanks in advance,
>>>>> Geoffrey
>>>>> 
>>>>> PS: « Erreur d’Entrée/Sortie » = « Input / Output Error » 
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ingénieur système
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>>> 
>>>>> <ls_example.txt>
>>>>> <split_brain__20150725.txt>
>>>>> <vol_home_healinfo.txt>
>>>>> <vol_home_info.txt>
>>>>> <vol_home_status.txt>
>>>> 
>>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] heaps split-brains during back-transfert

Reply via email to