Thank you, Ashish. I will study and try your solution on my virtual env. How I can detect the process of a brick on gluster server?
Many Thanks, Mauro Il ven 28 set 2018 16:39 Ashish Pandey <[email protected]> ha scritto: > > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"gluster-users" <[email protected]> > *Sent: *Friday, September 28, 2018 7:08:41 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > Dear Ashish, > > please excuse me, I'm very sorry for misunderstanding. > Before contacting you during last days, we checked all network devices > (switch 10GbE, cables, NICs, servers ports, and so on), operating systems > version and settings, network bonding configuration, gluster packages > versions, tuning profiles, etc. but everything seems to be ok. The first 3 > servers (and volume) operated without problem for one year. After we added > the new 3 servers we noticed something wrong. > Fortunately, yesterday you gave me an hand to understand where is (or > could be) the problem. > > At this moment, after we re-launched the remove-brick command, it seems > that the rebalance is going ahead without errors, but it is only scanning > the files. > May be that during the future data movement some errors could appear. > > For this reason, it could be useful to know how to proceed in case of a > new failure: insist with approach n.1 or change the strategy? > We are thinking to try to complete the running remove-brick procedure and > make a decision based on the outcome. > > Question: could we start approach n.2 also after having successfully > removed the V1 subvolume?! > > >>> Yes, we can do that. My idea is to use replace-brick command. > We will kill "ONLY" one brick process on s06. We will format this brick. > Then use replace-brick command to replace brick of a volume on s05 with > this formatted brick. > heal will be triggered and data of the respective volume will be placed on > this brick. > > Now, we can format the brick which got freed up on s05 and replace the > brick which we killed on s06 to s05. > During this process, we have to make sure heal completed before trying any > other replace/kill brick. > > It is tricky but looks doable. Think about it and try to perform it on > your virtual environment first before trying on production. > ------- > > If it is still possible, could you please illustrate the approach n.2 even > if I dont have free disks? > I would like to start thinking about it and test it on a virtual > environment. > > Thank you in advance for your help and patience. > Regards, > Mauro > > > > Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey <[email protected]> > ha scritto: > > > We could have taken approach -2 even if you did not have free disks. You > should have told me why are you > opting Approach-1 or perhaps I should have asked. > I was wondering for approach 1 because sometimes re-balance takes time > depending upon the data size. > > Anyway, I hope whole setup is stable, I mean it is not in the middle of > something which we can not stop. > If free disks are the only concern I will give you some more steps to deal > with it and follow the approach 2. > > Let me know once you think everything is fine with the system and there is > nothing to heal. > > --- > Ashish > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"gluster-users" <[email protected]> > *Sent: *Friday, September 28, 2018 4:21:03 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > Hi Ashish, > > as I said in my previous message, we adopted the first approach you > suggested (setting network.ping-timeout option to 0). > This choice was due to the absence of empty brick to be used as indicated > in the second approach. > > So, we launched remove-brick command on the first subvolume (V1, bricks > 1,2,3,4,5,6 on server s04). > Rebalance started moving the data across the other bricks, but, after > about 3TB of moved data, rebalance speed slowed down and some transfer > errors appeared in the rebalance.log of server s04. > At this point, since remaining 1,8TB need to be moved in order to complete > the step, we decided to stop the remove-brick execution and start it again > (I hope it doesn’t stop again before complete the rebalance) > > Now rebalance is not moving data, it’s only scanning files (please, take a > look to the following output) > > [root@s01 ~]# gluster volume remove-brick tier2 > s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick > s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick > s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status > Node Rebalanced-files size > scanned failures skipped status run time in > h:m:s > --------- ----------- ----------- > ----------- ----------- ----------- ------------ > -------------- > s04-stg 0 0Bytes > 182008 0 0 in progress 3:08:09 > Estimated time left for rebalance to complete : 442:45:06 > > If I’m not wrong, remove-brick rebalances entire cluster each time it > start. > Is there a way to speed up this procedure? Do you have some other > suggestion that, in this particular case, could be useful to reduce errors > (I know that they are related to the current volume configuration) and > improve rebalance performance avoiding to rebalance the entire cluster? > > Thank you in advance, > Mauro > > Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey <[email protected]> > ha scritto: > > > Yes, you can. > If not me others may also reply. > > --- > Ashish > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"gluster-users" <[email protected]> > *Sent: *Thursday, September 27, 2018 4:24:12 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > Dear Ashish, > > I can not thank you enough! > Your procedure and description is very detailed. > I think to follow the first approach after setting network.ping-timeout > option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this > value reduced rebalance errors). > After the fix I will set network.ping-timeout option to default value. > > Could I contact you again if I need some kind of suggestion? > > Thank you very much again. > Have a good day, > Mauro > > > Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey <[email protected]> > ha scritto: > > > Hi Mauro, > > We can divide the 36 newly added bricks into 6 set of 6 bricks each > starting from brick37. > That means, there are 6 ec subvolumes and we have to deal with one sub > volume at a time. > I have named it V1 to V6. > > Problem: > Take the case of V1. > The best configuration/setup would be to have all the 6 bricks of V1 on 6 > different nodes. > However, in your case you have added 3 new nodes. So, at least we should > have 2 bricks on 3 different newly added nodes. > This way, in 4+2 EC configuration, even if one node goes down you will > have 4 other bricks of that volume and the data on that volume would be > accessible. > In current setup if s04-stg goes down, you will loose all the data on V1 > and V2 as all the bricks will be down. We want to avoid and correct it. > > Now, we can have two approach to correct/modify this setup. > > *Approach 1* > We have to remove all the newly added bricks in a set of 6 bricks. This > will trigger re- balance and move whole data to other sub volumes. > Repeat the above step and then once all the bricks are removed, add those > bricks again in a set of 6 bricks, this time have 2 bricks from each of the > 3 newly added Nodes. > > While this is a valid and working approach, I personally think that this > will take long time and also require lot of movement of data. > > *Approach 2* > > In this approach we can use the heal process. We have to deal with all the > volumes (V1 to V6) one by one. Following are the steps for V1- > > *Step 1 - * > Use replace-brick command to move following bricks on *s05-stg* node *one > by one (heal should be completed after every replace brick command)* > > > *Brick39: s04-stg:/gluster/mnt3/brick to s05-stg/<brick which is free>* > > *Brick40: s04-stg:/gluster/mnt4/brick to s05-stg/<other brick which is > free>* > > Command : > gluster v replace-brick <volname> *s04-stg:/gluster/mnt3/brick* > *s05-stg:/<brick > which is free>* commit force > Try to give names to the bricks so that you can identify which 6 bricks > belongs to same ec subvolume > > > Use replace-brick command to move following bricks on *s06-stg* node one > by one > > Brick41: s04-stg:/gluster/mnt5/brick to *s06-stg/<brick which is free>* > Brick42: s04-stg:/gluster/mnt6/brick to *s06-stg/<other brick which is > free>* > > > *Step 2* - After, every replace-brick command, you have to wait for heal > to be completed. > check *"gluster v heal <volname> info "* if it shows any entry you have > to wait for it to be completed. > > After successful step 1 and step 2, setup for sub volume V1 will be fixed. > The same steps you have to perform for other volumes. Only thing is that > the nodes would be different on which you have to move the bricks. > > > > > V1 > > Brick37: s04-stg:/gluster/mnt1/brick > Brick38: s04-stg:/gluster/mnt2/brick > Brick39: s04-stg:/gluster/mnt3/brick > Brick40: s04-stg:/gluster/mnt4/brick > Brick41: s04-stg:/gluster/mnt5/brick > Brick42: s04-stg:/gluster/mnt6/brick > > V2 > Brick43: s04-stg:/gluster/mnt7/brick > Brick44: s04-stg:/gluster/mnt8/brick > Brick45: s04-stg:/gluster/mnt9/brick > Brick46: s04-stg:/gluster/mnt10/brick > Brick47: s04-stg:/gluster/mnt11/brick > Brick48: s04-stg:/gluster/mnt12/brick > > V3 > Brick49: s05-stg:/gluster/mnt1/brick > Brick50: s05-stg:/gluster/mnt2/brick > Brick51: s05-stg:/gluster/mnt3/brick > Brick52: s05-stg:/gluster/mnt4/brick > Brick53: s05-stg:/gluster/mnt5/brick > Brick54: s05-stg:/gluster/mnt6/brick > > V4 > Brick55: s05-stg:/gluster/mnt7/brick > Brick56: s05-stg:/gluster/mnt8/brick > Brick57: s05-stg:/gluster/mnt9/brick > Brick58: s05-stg:/gluster/mnt10/brick > Brick59: s05-stg:/gluster/mnt11/brick > Brick60: s05-stg:/gluster/mnt12/brick > > V5 > Brick61: s06-stg:/gluster/mnt1/brick > Brick62: s06-stg:/gluster/mnt2/brick > Brick63: s06-stg:/gluster/mnt3/brick > Brick64: s06-stg:/gluster/mnt4/brick > Brick65: s06-stg:/gluster/mnt5/brick > Brick66: s06-stg:/gluster/mnt6/brick > > V6 > Brick67: s06-stg:/gluster/mnt7/brick > Brick68: s06-stg:/gluster/mnt8/brick > Brick69: s06-stg:/gluster/mnt9/brick > Brick70: s06-stg:/gluster/mnt10/brick > Brick71: s06-stg:/gluster/mnt11/brick > Brick72: s06-stg:/gluster/mnt12/brick > > > Just a note that these steps need movement of data. > Be careful while performing these steps and do one replace brick at a time > and only after heal completion go to next. > Let me know if you have any issues. > > --- > Ashish > > > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"gluster-users" <[email protected]> > *Sent: *Thursday, September 27, 2018 4:03:04 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > Dear Ashish, > > I hope I don’t disturb you so much, but I would like to ask you if you had > some time to dedicate to our problem. > Please, forgive my insistence. > > Thank you in advance, > Mauro > > Il giorno 26 set 2018, alle ore 19:56, Mauro Tridici < > [email protected]> ha scritto: > > Hi Ashish, > > sure, no problem! We are a little bit worried, but we can wait :-) > Thank you very much for your support and your availability. > > Regards, > Mauro > > > Il giorno 26 set 2018, alle ore 19:33, Ashish Pandey <[email protected]> > ha scritto: > > Hi Mauro, > > Yes, I can provide you step by step procedure to correct it. > Is it fine If i provide you the steps tomorrow as it is quite late over > here and I don't want to miss anything in hurry? > > --- > Ashish > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"gluster-users" <[email protected]> > *Sent: *Wednesday, September 26, 2018 6:54:19 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > Hi Ashish, > > in attachment you can find the rebalance log file and the last updated > brick log file (the other files in /var/log/glusterfs/bricks directory seem > to be too old). > I just stopped the running rebalance (as you can see at the bottom of the > rebalance log file). > So, if exists a safe procedure to correct the problem I would like execute > it. > > I don’t know if I can ask you it, but, if it is possible, could you please > describe me step by step the right procedure to remove the newly added > bricks without losing the data that have been already rebalanced? > > The following outputs show the result of “df -h” command executed on one > of the first 3 nodes (s01, s02, s03) already existing and on one of the > last 3 nodes (s04, s05, s06) added recently. > > [root@s06 bricks]# df -h > File system Dim. Usati Dispon. Uso% Montato su > /dev/mapper/cl_s06-root 100G 2,1G 98G 3% / > devtmpfs 32G 0 32G 0% /dev > tmpfs 32G 4,0K 32G 1% /dev/shm > tmpfs 32G 26M 32G 1% /run > tmpfs 32G 0 32G 0% /sys/fs/cgroup > /dev/mapper/cl_s06-var 100G 2,0G 99G 2% /var > /dev/mapper/cl_s06-gluster 100G 33M 100G 1% /gluster > /dev/sda1 1014M 152M 863M 15% /boot > /dev/mapper/gluster_vgd-gluster_lvd 9,0T 807G 8,3T 9% /gluster/mnt3 > /dev/mapper/gluster_vgg-gluster_lvg 9,0T 807G 8,3T 9% /gluster/mnt6 > /dev/mapper/gluster_vgc-gluster_lvc 9,0T 807G 8,3T 9% /gluster/mnt2 > /dev/mapper/gluster_vge-gluster_lve 9,0T 807G 8,3T 9% /gluster/mnt4 > /dev/mapper/gluster_vgj-gluster_lvj 9,0T 887G 8,2T 10% /gluster/mnt9 > /dev/mapper/gluster_vgb-gluster_lvb 9,0T 807G 8,3T 9% /gluster/mnt1 > /dev/mapper/gluster_vgh-gluster_lvh 9,0T 887G 8,2T 10% /gluster/mnt7 > /dev/mapper/gluster_vgf-gluster_lvf 9,0T 807G 8,3T 9% /gluster/mnt5 > /dev/mapper/gluster_vgi-gluster_lvi 9,0T 887G 8,2T 10% /gluster/mnt8 > /dev/mapper/gluster_vgl-gluster_lvl 9,0T 887G 8,2T 10% /gluster/mnt11 > /dev/mapper/gluster_vgk-gluster_lvk 9,0T 887G 8,2T 10% /gluster/mnt10 > /dev/mapper/gluster_vgm-gluster_lvm 9,0T 887G 8,2T 10% /gluster/mnt12 > tmpfs 6,3G 0 6,3G 0% /run/user/0 > > [root@s01 ~]# df -h > File system Dim. Usati Dispon. Uso% Montato su > /dev/mapper/cl_s01-root 100G 5,3G 95G 6% / > devtmpfs 32G 0 32G 0% /dev > tmpfs 32G 39M 32G 1% /dev/shm > tmpfs 32G 26M 32G 1% /run > tmpfs 32G 0 32G 0% /sys/fs/cgroup > /dev/mapper/cl_s01-var 100G 11G 90G 11% /var > /dev/md127 1015M 151M 865M 15% /boot > /dev/mapper/cl_s01-gluster 100G 33M 100G 1% /gluster > /dev/mapper/gluster_vgi-gluster_lvi 9,0T 5,5T 3,6T 61% /gluster/mnt7 > /dev/mapper/gluster_vgm-gluster_lvm 9,0T 5,4T 3,6T 61% /gluster/mnt11 > /dev/mapper/gluster_vgf-gluster_lvf 9,0T 5,7T 3,4T 63% /gluster/mnt4 > /dev/mapper/gluster_vgl-gluster_lvl 9,0T 5,8T 3,3T 64% /gluster/mnt10 > /dev/mapper/gluster_vgj-gluster_lvj 9,0T 5,5T 3,6T 61% /gluster/mnt8 > /dev/mapper/gluster_vgn-gluster_lvn 9,0T 5,4T 3,6T 61% /gluster/mnt12 > /dev/mapper/gluster_vgk-gluster_lvk 9,0T 5,8T 3,3T 64% /gluster/mnt9 > /dev/mapper/gluster_vgh-gluster_lvh 9,0T 5,6T 3,5T 63% /gluster/mnt6 > /dev/mapper/gluster_vgg-gluster_lvg 9,0T 5,6T 3,5T 63% /gluster/mnt5 > /dev/mapper/gluster_vge-gluster_lve 9,0T 5,7T 3,4T 63% /gluster/mnt3 > /dev/mapper/gluster_vgc-gluster_lvc 9,0T 5,6T 3,5T 62% /gluster/mnt1 > /dev/mapper/gluster_vgd-gluster_lvd 9,0T 5,6T 3,5T 62% /gluster/mnt2 > tmpfs 6,3G 0 6,3G 0% /run/user/0 > s01-stg:tier2 420T 159T 262T 38% /tier2 > > As you can see, used space value of each brick of the last servers is > about 800GB. > > Thank you, > Mauro > > > > > > > > > Il giorno 26 set 2018, alle ore 14:51, Ashish Pandey <[email protected]> > ha scritto: > > Hi Mauro, > > rebalance and brick logs should be the first thing we should go through. > > There is a procedure to correct the configuration/setup but the situation > you are in is difficult to follow that procedure. > You should have added the bricks hosted on s04-stg, s05-stg and s06-stg > the same way you had the previous configuration. > That means 2 bricks on each node for one subvolume. > The procedure will require a lot of replace bricks which will again need > healing and all. In addition to that we have to wait for re-balance to > complete. > > I would suggest that if whole data has not been rebalanced and if you can > stop the rebalance and remove these newly added bricks properly then you > should remove these newly added bricks. > After that, add these bricks so that you have 2 bricks of each volume on 3 > newly added nodes. > > Yes, it is like undoing whole effort but it is better to do it now then > facing issues in future when it will be almost impossible to correct these > things if you have lots of data. > > --- > Ashish > > > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"gluster-users" <[email protected]> > *Sent: *Wednesday, September 26, 2018 5:55:02 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > Dear Ashish, > > thank you for you answer. > I could provide you the entire log file related to glusterd, glusterfsd > and rebalance. > Please, could you indicate which one you need first? > > Yes, we added the last 36 bricks after creating vol. Is there a procedure > to correct this error? Is it still possible to do it? > > Many thanks, > Mauro > > Il giorno 26 set 2018, alle ore 14:13, Ashish Pandey <[email protected]> > ha scritto: > > > I think we don't have enough logs to debug this so I would suggest you to > provide more logs/info. > I have also observed that the configuration and setup of your volume is > not very efficient. > > For example: > Brick37: s04-stg:/gluster/mnt1/brick > Brick38: s04-stg:/gluster/mnt2/brick > Brick39: s04-stg:/gluster/mnt3/brick > Brick40: s04-stg:/gluster/mnt4/brick > Brick41: s04-stg:/gluster/mnt5/brick > Brick42: s04-stg:/gluster/mnt6/brick > Brick43: s04-stg:/gluster/mnt7/brick > Brick44: s04-stg:/gluster/mnt8/brick > Brick45: s04-stg:/gluster/mnt9/brick > Brick46: s04-stg:/gluster/mnt10/brick > Brick47: s04-stg:/gluster/mnt11/brick > Brick48: s04-stg:/gluster/mnt12/brick > > These 12 bricks are on same node and the sub volume made up of these > bricks will be of same subvolume, which is not good. Same is true for the > bricks hosted on s05-stg and s06-stg > I think you have added these bricks after creating vol. The probability of > disruption in connection of these bricks will be higher in this case. > > --- > Ashish > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"gluster-users" <[email protected]> > *Sent: *Wednesday, September 26, 2018 3:38:35 PM > *Subject: *[Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > Dear All, Dear Nithya, > > after upgrading from 3.10.5 version to 3.12.14, I tried to start a > rebalance process to distribute data across the bricks, but something goes > wrong. > Rebalance failed on different nodes and the time value needed to complete > the procedure seems to be very high. > > [root@s01 ~]# gluster volume rebalance tier2 status > Node Rebalanced-files size > scanned failures skipped status run time in > h:m:s > --------- > ----------- ----------- ----------- ----------- ----------- > ------------ -------------- > localhost 19 161.6GB > 537 2 2 in progress 0:32:23 > s02-stg 25 212.7GB > 526 5 2 in progress 0:32:25 > s03-stg 4 69.1GB > 511 0 0 in progress 0:32:25 > s04-stg 4 484Bytes > 12283 0 3 in progress 0:32:25 > s05-stg 23 484Bytes > 11049 0 10 in progress 0:32:25 > s06-stg 3 1.2GB > 8032 11 3 failed 0:17:57 > Estimated time left for rebalance to complete : 3601:05:41 > volume rebalance: tier2: success > > When rebalance processes fail, I can see the following kind of errors in > /var/log/glusterfs/tier2-rebalance.log > > Error type 1) > > [2018-09-26 08:50:19.872575] W [MSGID: 122053] > [ec-common.c:269:ec_check_status] 0-tier2-disperse-10: Operation failed on > 2 of 6 subvolumes.(up=111111, mask=100111, remaining= > 000000, good=100111, bad=011000) > [2018-09-26 08:50:19.901792] W [MSGID: 122053] > [ec-common.c:269:ec_check_status] 0-tier2-disperse-11: Operation failed on > 1 of 6 subvolumes.(up=111111, mask=111101, remaining= > 000000, good=111101, bad=000010) > > Error type 2) > > [2018-09-26 08:53:31.566836] W [socket.c:600:__socket_rwv] > 0-tier2-client-53: readv on 192.168.0.55:49153 failed (Connection reset > by peer) > > Error type 3) > > [2018-09-26 08:57:37.852590] W [MSGID: 122035] > [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation > with some subvolumes unavailable (10) > [2018-09-26 08:57:39.282306] W [MSGID: 122035] > [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation > with some subvolumes unavailable (10) > [2018-09-26 09:02:04.928408] W [MSGID: 109023] > [dht-rebalance.c:1013:__dht_check_free_space] 0-tier2-dht: data movement of > file {blocks:0 name:(/OPA/archive/historical/dts/MRE > A/Observations/Observations/MREA14/Cs-1/CMCC/raw/CS013.ext)} would result > in dst node (tier2-disperse-5:2440190848) having lower disk space than the > source node (tier2-dispers > e-11:71373083776).Skipping file. > > Error type 4) > > W [rpc-clnt-ping.c:223:rpc_clnt_ping_cbk] 0-tier2-client-7: socket > disconnected > > Error type 5) > > [2018-09-26 09:07:42.333720] W [glusterfsd.c:1375:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7e25) [0x7f0417e0ee25] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55 > 90086004b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55900860032b] > ) 0-: received signum (15), shutting down > > Error type 6) > > [2018-09-25 08:09:18.340658] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-4: server > 192.168.0.52:49153 has not responded in the last 42 seconds, > disconnecting. > > It seems that there are some network or timeout problems, but the network > usage/traffic values are not so high. > Do you think that, in my volume configuration, I have to modify some > volume options related to thread and/or network parameters? > Could you, please, help me to understand the cause of the problems above? > > You can find below our volume info: > (volume is implemented on 6 servers; each server configuration: 2 cpu > 10-cores, 64GB RAM, 1 SSD dedicated to the OS, 12 x 10TB HD) > > [root@s04 ~]# gluster vol info > > <div style="margin: 0px; line-height: nor > >
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
