Just wanted to ping this to see if you guys had any thoughts, or other scripts I can run for this stuff. It's still predicting another 90 days to rebalance this, and performance is basically garbage while it rebalances.
Rusty On Mon, Jul 23, 2018 at 10:19 AM, Rusty Bower <ru...@rustybower.com> wrote: > datanode03 is the newest brick > > the bricks had gotten pretty full, which I think might be part of the > issue: > - datanode01 /dev/sda1 51T 48T 3.3T 94% /mnt/data > - datanode02 /dev/sda1 51T 48T 3.4T 94% /mnt/data > - datanode03 /dev/md0 128T 4.6T 123T 4% /mnt/data > > each of the bricks are on a completely separate disk from the OS > > I'll shoot you the log files offline :) > > Thanks! > Rusty > > On Mon, Jul 23, 2018 at 3:12 AM, Nithya Balachandran <nbala...@redhat.com> > wrote: > >> Hi Rusty, >> >> Sorry I took so long to get back to you. >> >> Which is the newly added brick? I see datanode02 has not picked up any >> files for migration which is odd. >> How full are the individual bricks (df -h ) output. >> Is each of your bricks in a separate partition? >> Can you send me the rebalance logs from all 3 nodes (offline if you >> prefer)? >> >> We can try using scripts to speed up the rebalance if you prefer. >> >> Regards, >> Nithya >> >> >> >> On 16 July 2018 at 22:06, Rusty Bower <ru...@rustybower.com> wrote: >> >>> Thanks for the reply Nithya. >>> >>> 1. glusterfs 4.1.1 >>> >>> 2. Volume Name: data >>> Type: Distribute >>> Volume ID: 294d95ce-0ff3-4df9-bd8c-a52fc50442ba >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: datanode01:/mnt/data/bricks/data >>> Brick2: datanode02:/mnt/data/bricks/data >>> Brick3: datanode03:/mnt/data/bricks/data >>> Options Reconfigured: >>> performance.readdir-ahead: on >>> >>> 3. >>> Node Rebalanced-files size >>> scanned failures skipped status run time in >>> h:m:s >>> --------- ----------- ----------- >>> ----------- ----------- ----------- ------------ >>> -------------- >>> localhost 36822 11.3GB >>> 50715 0 0 in progress 26:46:17 >>> datanode02 0 0Bytes >>> 2852 0 0 in progress 26:46:16 >>> datanode03 3128 513.7MB >>> 11442 0 3128 in progress 26:46:17 >>> Estimated time left for rebalance to complete : > 2 months. Please try >>> again later. >>> volume rebalance: data: success >>> >>> 4. Directory structure is basically an rsync backup of some old systems >>> as well as all of my personal media. I can elaborate more, but it's a >>> pretty standard filesystem. >>> >>> 5. In some folders there might be up to like 12-15 levels of directories >>> (especially the backups) >>> >>> 6. I'm honestly not sure, I can try to scrounge this number up >>> >>> 7. My guess would be > 100k >>> >>> 8. Most files are pretty large (media files), but there's a lot of small >>> files (metadata and configuration files) as well >>> >>> I've also appended a (moderately sanitized) snippet of the rebalance >>> log (let me know if you need more) >>> >>> [2018-07-16 17:37:59.979003] I [MSGID: 0] >>> [dht-rebalance.c:1799:dht_migrate_file] >>> 0-data-dht: destination for file - /this/is/a/file/path/that/exis >>> ts/wz/wz/Npc.wz/2040036.img.xml is changed to - data-client-2 >>> [2018-07-16 17:38:00.004262] I [MSGID: 109022] >>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration >>> of /this/is/a/file/path/that/exists/wz/wz/Npc.wz/2112002.img.xml from >>> subvolume data-client-0 to data-client-2 >>> [2018-07-16 17:38:00.725582] I >>> [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt = >>> 55419279917056,rate_processed=446597.869797, elapsed = 96526.000000 >>> [2018-07-16 17:38:00.725641] I [dht-rebalance.c:5130:gf_defrag_status_get] >>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124092127 >>> seconds, seconds left = 123995601 >>> [2018-07-16 17:38:00.725709] I [MSGID: 109028] >>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 96526.00 secs >>> [2018-07-16 17:38:00.725738] I [MSGID: 109028] >>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>> [2018-07-16 17:38:02.769121] I >>> [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>> 0-glusterfs: TIME: (size) total_processed=43108305980 tmp_cnt = >>> 55419279917056,rate_processed=446588.616567, elapsed = 96528.000000 >>> [2018-07-16 17:38:02.769207] I [dht-rebalance.c:5130:gf_defrag_status_get] >>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124094698 >>> seconds, seconds left = 123998170 >>> [2018-07-16 17:38:02.769263] I [MSGID: 109028] >>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 96528.00 secs >>> [2018-07-16 17:38:02.769286] I [MSGID: 109028] >>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 36876, size: 12270259289, lookups: 50715, failures: 0, skipped: 0 >>> [2018-07-16 17:38:03.410469] I [dht-rebalance.c:1645:dht_migrate_file] >>> 0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201002.img.xml: >>> attempting to move from data-client-0 to data-client-2 >>> [2018-07-16 17:38:03.416127] I [MSGID: 109022] >>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration >>> of /this/is/a/file/path/that/exists/wz/wz/Npc.wz/2040036.img.xml from >>> subvolume data-client-0 to data-client-2 >>> [2018-07-16 17:38:04.738885] I [dht-rebalance.c:1645:dht_migrate_file] >>> 0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9110012.img.xml: >>> attempting to move from data-client-0 to data-client-2 >>> [2018-07-16 17:38:04.745722] I [MSGID: 109022] >>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration >>> of /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201002.img.xml from >>> subvolume data-client-0 to data-client-2 >>> [2018-07-16 17:38:04.812368] I >>> [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>> 0-glusterfs: TIME: (size) total_processed=43108308134 tmp_cnt = >>> 55419279917056,rate_processed=446579.386035, elapsed = 96530.000000 >>> [2018-07-16 17:38:04.812417] I [dht-rebalance.c:5130:gf_defrag_status_get] >>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124097263 >>> seconds, seconds left = 124000733 >>> [2018-07-16 17:38:04.812465] I [MSGID: 109028] >>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 96530.00 secs >>> [2018-07-16 17:38:04.812489] I [MSGID: 109028] >>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 36877, size: 12270261443, lookups: 50715, failures: 0, skipped: 0 >>> [2018-07-16 17:38:04.992413] I [dht-rebalance.c:1645:dht_migrate_file] >>> 0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/2050000.img.xml: >>> attempting to move from data-client-0 to data-client-2 >>> [2018-07-16 17:38:04.994122] I [MSGID: 109022] >>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration >>> of /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9110012.img.xml from >>> subvolume data-client-0 to data-client-2 >>> [2018-07-16 17:38:06.855618] I >>> [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt = >>> 55419279917056,rate_processed=446570.244043, elapsed = 96532.000000 >>> [2018-07-16 17:38:06.855719] I [dht-rebalance.c:5130:gf_defrag_status_get] >>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124099804 >>> seconds, seconds left = 124003272 >>> [2018-07-16 17:38:06.855770] I [MSGID: 109028] >>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 96532.00 secs >>> [2018-07-16 17:38:06.855793] I [MSGID: 109028] >>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>> [2018-07-16 17:38:08.511064] I [dht-rebalance.c:1645:dht_migrate_file] >>> 0-data-dht: /this/is/a/file/path/that/exists/wz/wz/Npc.wz/9201055.img.xml: >>> attempting to move from data-client-0 to data-client-2 >>> [2018-07-16 17:38:08.533029] I [MSGID: 109022] >>> [dht-rebalance.c:2274:dht_migrate_file] 0-data-dht: completed migration >>> of /this/is/a/file/path/that/exists/wz/wz/Npc.wz/2050000.img.xml from >>> subvolume data-client-0 to data-client-2 >>> [2018-07-16 17:38:08.899708] I >>> [dht-rebalance.c:4982:gf_defrag_get_estimates_based_on_size] >>> 0-glusterfs: TIME: (size) total_processed=43108318798 tmp_cnt = >>> 55419279917056,rate_processed=446560.991961, elapsed = 96534.000000 >>> [2018-07-16 17:38:08.899791] I [dht-rebalance.c:5130:gf_defrag_status_get] >>> 0-glusterfs: TIME: Estimated total time to complete (size)= 124102375 >>> seconds, seconds left = 124005841 >>> [2018-07-16 17:38:08.899842] I [MSGID: 109028] >>> [dht-rebalance.c:5210:gf_defrag_status_get] 0-glusterfs: Rebalance is >>> in progress. Time taken is 96534.00 secs >>> [2018-07-16 17:38:08.899865] I [MSGID: 109028] >>> [dht-rebalance.c:5214:gf_defrag_status_get] 0-glusterfs: Files >>> migrated: 36879, size: 12270266602, lookups: 50715, failures: 0, skipped: 0 >>> >>> >>> On Mon, Jul 16, 2018 at 7:37 AM, Nithya Balachandran < >>> nbala...@redhat.com> wrote: >>> >>>> If possible, please send the rebalance logs as well. >>>> >>>> >>>> On 16 July 2018 at 10:14, Nithya Balachandran <nbala...@redhat.com> >>>> wrote: >>>> >>>>> Hi Rusty, >>>>> >>>>> We need the following information: >>>>> >>>>> 1. The exact gluster version you are running >>>>> 2. gluster volume info <volname> >>>>> 3. gluster rebalance status >>>>> 4. Information on the directory structure and file locations on >>>>> your volume. >>>>> 5. How many levels of directories >>>>> 6. How many files and directories in each level >>>>> 7. How many directories and files in total (a rough estimate) >>>>> 8. Average file size >>>>> >>>>> Please note that having a rebalance running in the background should >>>>> not affect your volume access in any way. However I would like to know why >>>>> only 6000 files have been scanned in 6 hours. >>>>> >>>>> Regards, >>>>> Nithya >>>>> >>>>> >>>>> On 16 July 2018 at 06:13, Rusty Bower <ru...@rustybower.com> wrote: >>>>> >>>>>> Hey folks, >>>>>> >>>>>> I just added a new brick to my existing gluster volume, but *gluster >>>>>> volume rebalance data status* is telling me the following: Estimated >>>>>> time left for rebalance to complete : > 2 months. Please try again later. >>>>>> >>>>>> I already did a fix-mapping, but this thing is absolutely crawling >>>>>> trying to rebalance everything (last estimate was ~40 years) >>>>>> >>>>>> Any thoughts on if this is a bug, or ways to speed this up? It's >>>>>> taking ~6 hours to scan 6000 files, which seems unreasonably slow. >>>>>> >>>>>> Thanks >>>>>> Rusty >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users