Hi Mauro, Please stop the rebalance before you upgrade. Thanks, Nithya
On 15 September 2018 at 22:55, Mauro Tridici <mauro.trid...@cmcc.it> wrote: > > Hi Sunil, > > many thanks to you too. > I will follow your suggestions and the guide for upgrading to 3.12 > > Crossing fingers :-) > Regards, > Mauro > > Il giorno 15 set 2018, alle ore 11:57, Sunil Kumar Heggodu Gopala Acharya < > shegg...@redhat.com> ha scritto: > > Hi Mauro, > > As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as > part of https://bugzilla.redhat.com/show_bug.cgi?id=1454686. Hence, > upgrading to 3.12 as suggested before would be a right move. > > Here is the documentation for upgrading to 3.12: > https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ > > Regards, > Sunil kumar Acharya > > Senior Software Engineer > Red Hat > > <https://www.redhat.com/> > > T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/> > > <https://red.ht/sig> > TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> > > > On Sat, Sep 15, 2018 at 3:42 AM, Mauro Tridici <mauro.trid...@cmcc.it> > wrote: > >> >> Hi Nithya, >> >> thank you very much for your answer. >> I will wait for @Sunil opinion too before starting the upgrade procedure. >> >> Since it will be the first upgrade of our Gluster cluster, I would like >> to know if it could be a “virtually dangerous" procedure and if it will be >> the risk of losing data :-) >> Unfortunately, I can’t do a preventive copy of the volume data in another >> location. >> If it is possible, could you please illustrate the right steps needed to >> complete the upgrade procedure from the 3.10.5 to the 3.12 version? >> >> Thank you again, Nithya. >> Thank you to all of you for the help! >> >> Regards, >> Mauro >> >> Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran < >> nbala...@redhat.com> ha scritto: >> >> Hi Mauro, >> >> >> The rebalance code started using fallocate in 3.10.5 ( >> https://bugzilla.redhat.com/show_bug.cgi?id=1473132) which works fine on >> replicated volumes. However, we neglected to test this with EC volumes on >> 3.10. Once we discovered the issue, the EC fallocate implementation was >> made available in 3.11. >> >> At this point, I'm afraid the only option I see is to upgrade to at least >> 3.12. >> >> @Sunil, do you have anything to add? >> >> Regards, >> Nithya >> >> On 13 September 2018 at 18:34, Mauro Tridici <mauro.trid...@cmcc.it> >> wrote: >> >>> >>> Hi Nithya, >>> >>> thank you for involving EC group. >>> I will wait for your suggestions. >>> >>> Regards, >>> Mauro >>> >>> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran < >>> nbala...@redhat.com> ha scritto: >>> >>> This looks like an issue because rebalance switched to using fallocate >>> which EC did not have implemented at that point. >>> >>> @Pranith, @Ashish, which version of gluster had support for fallocate in >>> EC? >>> >>> >>> Regards, >>> Nithya >>> >>> On 12 September 2018 at 19:24, Mauro Tridici <mauro.trid...@cmcc.it> >>> wrote: >>> >>>> Dear All, >>>> >>>> I recently added 3 servers (each one with 12 bricks) to an existing >>>> Gluster Distributed Disperse Volume. >>>> Volume extension has been completed without error and I already >>>> executed the rebalance procedure with fix-layout option with no problem. >>>> I just launched the rebalance procedure without fix-layout option, but, >>>> as you can see in the output below, I noticed that some failures have been >>>> detected. >>>> >>>> [root@s01 glusterfs]# gluster v rebalance tier2 status >>>> Node Rebalanced-files size >>>> scanned failures skipped status run time in >>>> h:m:s >>>> --------- ----------- ----------- >>>> ----------- ----------- ----------- ------------ >>>> -------------- >>>> localhost 71176 3.2MB >>>> 2137557 1530391 8128 in progress >>>> 13:59:05 >>>> s02-stg 0 0Bytes >>>> 0 0 0 completed >>>> 11:53:28 >>>> s03-stg 0 0Bytes >>>> 0 0 0 completed >>>> 11:53:32 >>>> s04-stg 0 0Bytes >>>> 0 0 0 completed >>>> 0:00:06 >>>> s05-stg 15 0Bytes >>>> 17055 0 18 completed >>>> 10:48:01 >>>> s06-stg 0 0Bytes >>>> 0 0 0 completed >>>> 0:00:06 >>>> Estimated time left for rebalance to complete : 0:46:53 >>>> volume rebalance: tier2: success >>>> >>>> In the volume rebalance log file, I detected a lot of error messages >>>> similar to the following ones: >>>> >>>> [2018-09-12 13:15:50.756703] E [MSGID: 0] >>>> [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-tier2-dht: Create dst failed on - tier2-disperse-6 for file - >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2 >>>> 00508_003.cam.h0.2005-12_grid.nc >>>> [2018-09-12 13:15:50.757025] E [MSGID: 109023] >>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005 >>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc >>>> [2018-09-12 13:15:50.759183] E [MSGID: 109023] >>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: >>>> fallocate failed for /CSP/sp1/CESM/archive/sps_2005 >>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc on >>>> tier2-disperse-9 (Operation not supported) >>>> [2018-09-12 13:15:50.759206] E [MSGID: 0] >>>> [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-tier2-dht: Create dst failed on - tier2-disperse-9 for file - >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2 >>>> 00508_003.cam.h0.2005-09_grid.nc >>>> [2018-09-12 13:15:50.759536] E [MSGID: 109023] >>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005 >>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc >>>> [2018-09-12 13:15:50.777219] E [MSGID: 109023] >>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: >>>> fallocate failed for /CSP/sp1/CESM/archive/sps_2005 >>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc on >>>> tier2-disperse-10 (Operation not supported) >>>> [2018-09-12 13:15:50.777241] E [MSGID: 0] >>>> [dht-rebalance.c:1696:dht_migrate_file] >>>> 0-tier2-dht: Create dst failed on - tier2-disperse-10 for file - >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_2 >>>> 00508_003.cam.h0.2006-01_grid.nc >>>> [2018-09-12 13:15:50.777676] E [MSGID: 109023] >>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>>> migrate-data failed for /CSP/sp1/CESM/archive/sps_2005 >>>> 08_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc >>>> >>>> Could you please help me to understand what is happening and how to >>>> solve it? >>>> >>>> Our Gluster implementation is based on Gluster v.3.10.5 >>>> >>>> Thank you in advance, >>>> Mauro >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >>> >>> ------------------------- >>> Mauro Tridici >>> >>> Fondazione CMCC >>> CMCC Supercomputing Center >>> presso Complesso Ecotekne - Università del Salento - >>> Strada Prov.le Lecce - Monteroni sn >>> 73100 Lecce IT >>> http://www.cmcc.it >>> >>> mobile: (+39) 327 5630841 >>> email: mauro.trid...@cmcc.it >>> >>> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Università del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it >> >> mobile: (+39) 327 5630841 >> email: mauro.trid...@cmcc.it >> >> > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Università del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it > > mobile: (+39) 327 5630841 > email: mauro.trid...@cmcc.it > >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users