On 1 October 2018 at 15:35, Mauro Tridici <[email protected]> wrote:
> Good morning Ashish, > > your explanations are always very useful, thank you very much: I will > remember these suggestions for any future needs. > Anyway, during the week-end, the remove-brick procedures ended > successfully and we were able to free up all bricks defined on server s04, > s05 and 6 bricks of 12 on server s06. > So, we can say that, thanks to your suggestions, we are about to complete > this first phase (removing of all bricks defined on s04, s05 and s06 > servers). > > I really appreciated your support. > Now I have a last question (I hope): after remove-brick commit I noticed > that some data remain on each brick (about 1.2GB of data). > Please, take a look to the “df-h_on_s04_s05_s06.txt”. > The situation is almost the same on all 3 servers mentioned above: a long > list of directories names and some files that are still on the brick, but > respective size is 0. > > Examples: > > a lot of empty directories on /gluster/mnt*/brick/.glusterfs > > 8 /gluster/mnt2/brick/.glusterfs/b7/1b > 0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee94a5-a77c- > 4c02-85a5-085992840c83 > 0 /gluster/mnt2/brick/.glusterfs/b7/ee/b7ee85d4-ce48- > 43a7-a89a-69c728ee8273 > > some empty files in directories in /gluster/mnt*/brick/* > > [root@s04 ~]# cd /gluster/mnt1/brick/ > [root@s04 brick]# ls -l > totale 32 > drwxr-xr-x 7 root root 100 11 set 22.14 *archive_calypso* > > [root@s04 brick]# cd archive_calypso/ > [root@s04 archive_calypso]# ll > totale 0 > drwxr-x--- 3 root 5200 29 11 set 22.13 *ans002* > drwxr-x--- 3 5104 5100 32 11 set 22.14 *ans004* > drwxr-x--- 3 4506 4500 31 11 set 22.14 *ans006* > drwxr-x--- 3 4515 4500 28 11 set 22.14 *ans015* > drwxr-x--- 4 4321 4300 54 11 set 22.14 *ans021* > [root@s04 archive_calypso]# du -a * > 0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/ > 19810501.0/echam5/echam_sf006_198110.01.gz > 0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0/echam5 > 0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.0 > 0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/ > 19810501.1/echam5/echam_sf006_198105.01.gz > 0 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/ > 19810501.1/echam5/echam_sf006_198109.01.gz > 8 ans002/archive/ans002/HINDCASTS/RUN_ATMWANG_LANSENS/19810501.1/echam5 > > What we have to do with this data? Should I backup this “empty” dirs and > files on a different storage before deleting them? > Hi Mauro, Are you sure these files and directories are empty? Please provide the ls -l output for the files. If they are 'T' files , they can be ignored. Regards, Nithya > > As soon as all the bricks will be empty, I plan to re-add the new bricks > using the following commands: > > *gluster peer detach s04* > *gluster peer detach s05* > *gluster peer detach s06* > > *gluster peer probe s04* > *gluster peer probe s05* > *gluster peer probe s06* > > *gluster volume add-brick tier2 s04-stg:/gluster/mnt1/brick > s05-stg:/gluster/mnt1/brick s06-stg:/gluster/mnt1/brick > s04-stg:/gluster/mnt2/brick s05-stg:/gluster/mnt2/brick > s06-stg:/gluster/mnt2/brick s04-stg:/gluster/mnt3/brick > s05-stg:/gluster/mnt3/brick s06-stg:/gluster/mnt3/brick > s04-stg:/gluster/mnt4/brick s05-stg:/gluster/mnt4/brick > s06-stg:/gluster/mnt4/brick s04-stg:/gluster/mnt5/brick > s05-stg:/gluster/mnt5/brick s06-stg:/gluster/mnt5/brick > s04-stg:/gluster/mnt6/brick s05-stg:/gluster/mnt6/brick > s06-stg:/gluster/mnt6/brick s04-stg:/gluster/mnt7/brick > s05-stg:/gluster/mnt7/brick s06-stg:/gluster/mnt7/brick > s04-stg:/gluster/mnt8/brick s05-stg:/gluster/mnt8/brick > s06-stg:/gluster/mnt8/brick s04-stg:/gluster/mnt9/brick > s05-stg:/gluster/mnt9/brick s06-stg:/gluster/mnt9/brick > s04-stg:/gluster/mnt10/brick s05-stg:/gluster/mnt10/brick > s06-stg:/gluster/mnt10/brick s04-stg:/gluster/mnt11/brick > s05-stg:/gluster/mnt11/brick s06-stg:/gluster/mnt11/brick > s04-stg:/gluster/mnt12/brick s05-stg:/gluster/mnt12/brick > s06-stg:/gluster/mnt12/brick force* > > *gluster volume rebalance tier2 fix-layout start* > > *gluster volume rebalance tier2 start* > > From your point of view, are they the right commands to close this > repairing task? > > Thank you very much for your help. > Regards, > Mauro > > > > > > Il giorno 01 ott 2018, alle ore 09:17, Ashish Pandey <[email protected]> > ha scritto: > > > Ohh!! It is because brick-multiplexing is "ON" on your setup. Not sure if > it is by default ON for 3.12.14 or not. > > See "cluster.brick-multiplex: on" in gluster v <volname> info > If brick multiplexing is ON, you will see only one process running for all > the bricks on a Node. > > So we have to do following step to kill any one brick on a node. > > *Steps to kill a brick when multiplex is on -* > > *Step - 1 * > Find *unix domain_socket* of the process on a node. > Run "ps -aef | grep glusterfsd" on a node. Example : > > This is on my machine when I have all the bricks on same machine > > [root@apandey glusterfs]# ps -aef | grep glusterfsd | grep -v mnt > root 28311 1 0 11:16 ? 00:00:06 /usr/local/sbin/glusterfsd > -s apandey --volfile-id vol.apandey.home-apandey-bricks-gluster-vol-1 -p > /var/run/gluster/vols/vol/apandey-home-apandey-bricks-gluster-vol-1.pid > -S /var/run/gluster/1259033d2ff4f4e5.socket --brick-name > /home/apandey/bricks/gluster/vol-1 -l /var/log/glusterfs/bricks/ > home-apandey-bricks-gluster-vol-1.log --xlator-option > *-posix.glusterd-uuid=61b4524c-ccf3-4219-aaff-b3497ac6dd24 --process-name > brick --brick-port 49158 --xlator-option vol-server.listen-port=49158 > > Here, /var/run/gluster/1259033d2ff4f4e5.socket is the unix domain socket > > *Step - 2* > Run following command to kill a brick on the same node - > > gf_attach -d <unix domain_socket> brick_path_on_that_node > > Example: > > *gf_attach -d /var/run/gluster/1259033d2ff4f4e5.socket > /home/apandey/bricks/gluster/vol-6* > > Status of volume: vol > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick apandey:/home/apandey/bricks/gluster/ > vol-1 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-2 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-3 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-4 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-5 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-6 49158 0 Y > 28311 > Self-heal Daemon on localhost N/A N/A Y > 29787 > > Task Status of Volume vol > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > [root@apandey glusterfs]# > [root@apandey glusterfs]# > [root@apandey glusterfs]# gf_attach -d > /var/run/gluster/1259033d2ff4f4e5.socket > /home/apandey/bricks/gluster/vol-6 > OK > [root@apandey glusterfs]# gluster v status > Status of volume: vol > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick apandey:/home/apandey/bricks/gluster/ > vol-1 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-2 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-3 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-4 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-5 49158 0 Y > 28311 > Brick apandey:/home/apandey/bricks/gluster/ > vol-6 N/A N/A N > N/A > Self-heal Daemon on localhost N/A N/A Y > 29787 > > Task Status of Volume vol > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > > To start a brick we just need to start volume using "force" > > gluster v start <volname> force > > ---- > Ashish > > > > > > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"Gluster Users" <[email protected]> > *Sent: *Friday, September 28, 2018 9:25:53 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > > I asked you how to detect the PID of a specific brick because I see that > more than one brick has the same PID (also on my virtual env). > If I kill one of them I risk to kill some other brick. Is it normal? > > [root@s01 ~]# gluster vol status > Status of volume: tier2 > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick s01-stg:/gluster/mnt1/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt1/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt1/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt2/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt2/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt2/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt3/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt3/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt3/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt4/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt4/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt4/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt5/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt5/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt5/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt6/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt6/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt6/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt7/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt7/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt7/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt8/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt8/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt8/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt9/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt9/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt9/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt10/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt10/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt10/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt11/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt11/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt11/brick 49153 0 Y > 3953 > Brick s01-stg:/gluster/mnt12/brick 49153 0 Y > 3956 > Brick s02-stg:/gluster/mnt12/brick 49153 0 Y > 3956 > Brick s03-stg:/gluster/mnt12/brick 49153 0 Y > 3953 > Brick s04-stg:/gluster/mnt1/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt2/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt3/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt4/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt5/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt6/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt7/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt8/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt9/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt10/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt11/brick 49153 0 Y > 3433 > Brick s04-stg:/gluster/mnt12/brick 49153 0 Y > 3433 > Brick s05-stg:/gluster/mnt1/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt2/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt3/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt4/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt5/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt6/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt7/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt8/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt9/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt10/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt11/brick 49153 0 Y > 3709 > Brick s05-stg:/gluster/mnt12/brick 49153 0 Y > 3709 > Brick s06-stg:/gluster/mnt1/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt2/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt3/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt4/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt5/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt6/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt7/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt8/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt9/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt10/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt11/brick 49153 0 Y > 3644 > Brick s06-stg:/gluster/mnt12/brick 49153 0 Y > 3644 > Self-heal Daemon on localhost N/A N/A Y > 79376 > Quota Daemon on localhost N/A N/A Y > 79472 > Bitrot Daemon on localhost N/A N/A Y > 79485 > Scrubber Daemon on localhost N/A N/A Y > 79505 > Self-heal Daemon on s03-stg N/A N/A Y > 77073 > Quota Daemon on s03-stg N/A N/A Y > 77148 > Bitrot Daemon on s03-stg N/A N/A Y > 77160 > Scrubber Daemon on s03-stg N/A N/A Y > 77191 > Self-heal Daemon on s02-stg N/A N/A Y > 80150 > Quota Daemon on s02-stg N/A N/A Y > 80226 > Bitrot Daemon on s02-stg N/A N/A Y > 80238 > Scrubber Daemon on s02-stg N/A N/A Y > 80269 > Self-heal Daemon on s04-stg N/A N/A Y > 106815 > Quota Daemon on s04-stg N/A N/A Y > 106866 > Bitrot Daemon on s04-stg N/A N/A Y > 106878 > Scrubber Daemon on s04-stg N/A N/A Y > 106897 > Self-heal Daemon on s05-stg N/A N/A Y > 130807 > Quota Daemon on s05-stg N/A N/A Y > 130884 > Bitrot Daemon on s05-stg N/A N/A Y > 130896 > Scrubber Daemon on s05-stg N/A N/A Y > 130927 > Self-heal Daemon on s06-stg N/A N/A Y > 157146 > Quota Daemon on s06-stg N/A N/A Y > 157239 > Bitrot Daemon on s06-stg N/A N/A Y > 157252 > Scrubber Daemon on s06-stg N/A N/A Y > 157288 > > Task Status of Volume tier2 > ------------------------------------------------------------ > ------------------ > Task : Remove brick > ID : 06ec63bb-a441-4b85-b3cf-ac8e9df4830f > Removed bricks: > s04-stg:/gluster/mnt1/brick > s04-stg:/gluster/mnt2/brick > s04-stg:/gluster/mnt3/brick > s04-stg:/gluster/mnt4/brick > s04-stg:/gluster/mnt5/brick > s04-stg:/gluster/mnt6/brick > Status : in progress > > [root@s01 ~]# ps -ef|grep glusterfs > root 3956 1 79 set25 ? 2-14:33:57 /usr/sbin/*glusterfs*d > -s s01-stg --volfile-id tier2.s01-stg.gluster-mnt1-brick -p > /var/run/gluster/vols/tier2/s01-stg-gluster-mnt1-brick.pid -S > /var/run/gluster/a889b8a21ac2afcbfa0563b9dd4db265.socket --brick-name > /gluster/mnt1/brick -l /var/log/*glusterfs*/bricks/gluster-mnt1-brick.log > --xlator-option *-posix.glusterd-uuid=b734b083-4630-4523-9402-05d03565efee > --brick-port 49153 --xlator-option tier2-server.listen-port=49153 > root 79376 1 0 09:16 ? 00:04:16 /usr/sbin/*glusterfs* > -s localhost --volfile-id gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid > -l /var/log/*glusterfs*/glustershd.log -S /var/run/gluster/ > 4fab1a27e6ee700b3b9a3b3393ab7445.socket --xlator-option > *replicate*.node-uuid=b734b083-4630-4523-9402-05d03565efee > root 79472 1 0 09:16 ? 00:00:42 /usr/sbin/*glusterfs* > -s localhost --volfile-id gluster/quotad -p /var/run/gluster/quotad/quotad.pid > -l /var/log/*glusterfs*/quotad.log -S /var/run/gluster/ > 958ab34799fc58f4dfe20e5732eea70b.socket --xlator-option > *replicate*.data-self-heal=off --xlator-option > *replicate*.metadata-self-heal=off > --xlator-option *replicate*.entry-self-heal=off > root 79485 1 7 09:16 ? 00:40:43 /usr/sbin/*glusterfs* > -s localhost --volfile-id gluster/bitd -p /var/run/gluster/bitd/bitd.pid -l > /var/log/*glusterfs*/bitd.log -S /var/run/gluster/ > b2ea9da593fae1bc4d94e65aefdbdda9.socket --global-timer-wheel > root 79505 1 0 09:16 ? 00:00:01 /usr/sbin/*glusterfs* > -s localhost --volfile-id gluster/scrub -p /var/run/gluster/scrub/scrub.pid > -l /var/log*glusterfs*/scrub.log -S /var/run/gluster/ > ee7886cbcf8d2adf261084b608c905d5.socket --global-timer-wheel > root 137362 137225 0 17:53 pts/0 00:00:00 grep --color=auto > *glusterfs* > > Il giorno 28 set 2018, alle ore 17:47, Ashish Pandey <[email protected]> > ha scritto: > > > > ------------------------------ > *From: *"Mauro Tridici" <[email protected]> > *To: *"Ashish Pandey" <[email protected]> > *Cc: *"Gluster Users" <[email protected]> > *Sent: *Friday, September 28, 2018 9:08:52 PM > *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse > volume based on 3.12.14 version > > Thank you, Ashish. > > I will study and try your solution on my virtual env. > How I can detect the process of a brick on gluster server? > > Many Thanks, > Mauro > > > gluster v status <volname> will give you the list of bricks and the > respective process id. > Also, you can use "ps aux | grep glusterfs" to see all the processes on a > node but I think the above step also do the same. > > --- > Ashish > > > > Il ven 28 set 2018 16:39 Ashish Pandey <[email protected]> ha scritto: > >> >> >> ------------------------------ >> *From: *"Mauro Tridici" <[email protected]> >> *To: *"Ashish Pandey" <[email protected]> >> *Cc: *"gluster-users" <[email protected]> >> *Sent: *Friday, September 28, 2018 7:08:41 PM >> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse >> volume based on 3.12.14 version >> >> >> Dear Ashish, >> >> please excuse me, I'm very sorry for misunderstanding. >> Before contacting you during last days, we checked all network devices >> (switch 10GbE, cables, NICs, servers ports, and so on), operating systems >> version and settings, network bonding configuration, gluster packages >> versions, tuning profiles, etc. but everything seems to be ok. The first 3 >> servers (and volume) operated without problem for one year. After we added >> the new 3 servers we noticed something wrong. >> Fortunately, yesterday you gave me an hand to understand where is (or >> could be) the problem. >> >> At this moment, after we re-launched the remove-brick command, it seems >> that the rebalance is going ahead without errors, but it is only scanning >> the files. >> May be that during the future data movement some errors could appear. >> >> For this reason, it could be useful to know how to proceed in case of a >> new failure: insist with approach n.1 or change the strategy? >> We are thinking to try to complete the running remove-brick procedure and >> make a decision based on the outcome. >> >> Question: could we start approach n.2 also after having successfully >> removed the V1 subvolume?! >> >> >>> Yes, we can do that. My idea is to use replace-brick command. >> We will kill "ONLY" one brick process on s06. We will format this brick. >> Then use replace-brick command to replace brick of a volume on s05 with >> this formatted brick. >> heal will be triggered and data of the respective volume will be placed >> on this brick. >> >> Now, we can format the brick which got freed up on s05 and replace the >> brick which we killed on s06 to s05. >> During this process, we have to make sure heal completed before trying >> any other replace/kill brick. >> >> It is tricky but looks doable. Think about it and try to perform it on >> your virtual environment first before trying on production. >> ------- >> >> If it is still possible, could you please illustrate the approach n.2 >> even if I dont have free disks? >> I would like to start thinking about it and test it on a virtual >> environment. >> >> Thank you in advance for your help and patience. >> Regards, >> Mauro >> >> >> >> Il giorno 28 set 2018, alle ore 14:36, Ashish Pandey <[email protected]> >> ha scritto: >> >> >> We could have taken approach -2 even if you did not have free disks. You >> should have told me why are you >> opting Approach-1 or perhaps I should have asked. >> I was wondering for approach 1 because sometimes re-balance takes time >> depending upon the data size. >> >> Anyway, I hope whole setup is stable, I mean it is not in the middle of >> something which we can not stop. >> If free disks are the only concern I will give you some more steps to >> deal with it and follow the approach 2. >> >> Let me know once you think everything is fine with the system and there >> is nothing to heal. >> >> --- >> Ashish >> >> ------------------------------ >> *From: *"Mauro Tridici" <[email protected]> >> *To: *"Ashish Pandey" <[email protected]> >> *Cc: *"gluster-users" <[email protected]> >> *Sent: *Friday, September 28, 2018 4:21:03 PM >> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse >> volume based on 3.12.14 version >> >> >> Hi Ashish, >> >> as I said in my previous message, we adopted the first approach you >> suggested (setting network.ping-timeout option to 0). >> This choice was due to the absence of empty brick to be used as indicated >> in the second approach. >> >> So, we launched remove-brick command on the first subvolume (V1, bricks >> 1,2,3,4,5,6 on server s04). >> Rebalance started moving the data across the other bricks, but, after >> about 3TB of moved data, rebalance speed slowed down and some transfer >> errors appeared in the rebalance.log of server s04. >> At this point, since remaining 1,8TB need to be moved in order to >> complete the step, we decided to stop the remove-brick execution and start >> it again (I hope it doesn’t stop again before complete the rebalance) >> >> Now rebalance is not moving data, it’s only scanning files (please, take >> a look to the following output) >> >> [root@s01 ~]# gluster volume remove-brick tier2 >> s04-stg:/gluster/mnt1/brick s04-stg:/gluster/mnt2/brick >> s04-stg:/gluster/mnt3/brick s04-stg:/gluster/mnt4/brick >> s04-stg:/gluster/mnt5/brick s04-stg:/gluster/mnt6/brick status >> Node Rebalanced-files size >> scanned failures skipped status run time in >> h:m:s >> --------- ----------- ----------- >> ----------- ----------- ----------- ------------ >> -------------- >> s04-stg 0 0Bytes >> 182008 0 0 in progress 3:08:09 >> Estimated time left for rebalance to complete : 442:45:06 >> >> If I’m not wrong, remove-brick rebalances entire cluster each time it >> start. >> Is there a way to speed up this procedure? Do you have some other >> suggestion that, in this particular case, could be useful to reduce errors >> (I know that they are related to the current volume configuration) and >> improve rebalance performance avoiding to rebalance the entire cluster? >> >> Thank you in advance, >> Mauro >> >> Il giorno 27 set 2018, alle ore 13:14, Ashish Pandey <[email protected]> >> ha scritto: >> >> >> Yes, you can. >> If not me others may also reply. >> >> --- >> Ashish >> >> ------------------------------ >> *From: *"Mauro Tridici" <[email protected]> >> *To: *"Ashish Pandey" <[email protected]> >> *Cc: *"gluster-users" <[email protected]> >> *Sent: *Thursday, September 27, 2018 4:24:12 PM >> *Subject: *Re: [Gluster-users] Rebalance failed on Distributed Disperse >> volume based on 3.12.14 version >> >> >> Dear Ashish, >> >> I can not thank you enough! >> Your procedure and description is very detailed. >> I think to follow the first approach after setting network.ping-timeout >> option to 0 (If I’m not wrong “0" means “infinite”...I noticed that this >> value reduced rebalance errors). >> After the fix I will set network.ping-timeout option to default value. >> >> Could I contact you again if I need some kind of suggestion? >> >> Thank you very much again. >> Have a good day, >> Mauro >> >> >> Il giorno 27 set 2018, alle ore 12:38, Ashish Pandey <[email protected]> >> ha scritto: >> >> >> Hi Mauro, >> >> We can divide the 36 newly added bricks into 6 set of 6 bricks each >> starting from brick37. >> That means, there are 6 ec subvolumes and we have to deal with one sub >> volume at a time. >> I have named it V1 to V6. >> >> Problem: >> Take the case of V1. >> The best configuration/setup would be to have all the 6 bricks of V1 on 6 >> different nodes. >> However, in your case you have added 3 new nodes. So, at least we should >> have 2 bricks on 3 different newly added nodes. >> This way, in 4+2 EC configuration, even if one node goes down you will >> have 4 other bricks of that volume >> >> > _______________________________________________ > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users > ... > > [Message clipped]
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
