Hi Nithya, thank you. Rabalance terminated (with a lot of failures) a few days ago. Due to some I/O errors during write operations on gluster volume, I had to re-launch fix-layout rebalance
Now, fix-layouts rebalance is completed and I can write data without I/O errors. As soon as possible I will start the upgrade procedure. Thank you again for your support. Regards, Mauro > Il giorno 16 set 2018, alle ore 06:07, Nithya Balachandran > <[email protected]> ha scritto: > > Hi Mauro, > > Please stop the rebalance before you upgrade. > Thanks, > Nithya > > On 15 September 2018 at 22:55, Mauro Tridici <[email protected] > <mailto:[email protected]>> wrote: > > Hi Sunil, > > many thanks to you too. > I will follow your suggestions and the guide for upgrading to 3.12 > > Crossing fingers :-) > Regards, > Mauro > >> Il giorno 15 set 2018, alle ore 11:57, Sunil Kumar Heggodu Gopala Acharya >> <[email protected] <mailto:[email protected]>> ha scritto: >> >> Hi Mauro, >> >> As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part >> of https://bugzilla.redhat.com/show_bug.cgi?id=1454686 >> <https://bugzilla.redhat.com/show_bug.cgi?id=1454686>. Hence, upgrading to >> 3.12 as suggested before would be a right move. >> >> Here is the documentation for upgrading to 3.12: >> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ >> <https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/> >> >> Regards, >> SUNIL KUMAR ACHARYA >> SENIOR SOFTWARE ENGINEER >> Red Hat >> >> <https://www.redhat.com/> >> T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/> >> >> >> <https://red.ht/sig> >> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> >> >> On Sat, Sep 15, 2018 at 3:42 AM, Mauro Tridici <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Nithya, >> >> thank you very much for your answer. >> I will wait for @Sunil opinion too before starting the upgrade procedure. >> >> Since it will be the first upgrade of our Gluster cluster, I would like to >> know if it could be a “virtually dangerous" procedure and if it will be the >> risk of losing data :-) >> Unfortunately, I can’t do a preventive copy of the volume data in another >> location. >> If it is possible, could you please illustrate the right steps needed to >> complete the upgrade procedure from the 3.10.5 to the 3.12 version? >> >> Thank you again, Nithya. >> Thank you to all of you for the help! >> >> Regards, >> Mauro >> >>> Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran >>> <[email protected] <mailto:[email protected]>> ha scritto: >>> >>> Hi Mauro, >>> >>> >>> The rebalance code started using fallocate in 3.10.5 >>> (https://bugzilla.redhat.com/show_bug.cgi?id=1473132 >>> <https://bugzilla.redhat.com/show_bug.cgi?id=1473132>) which works fine on >>> replicated volumes. However, we neglected to test this with EC volumes on >>> 3.10. Once we discovered the issue, the EC fallocate implementation was >>> made available in 3.11. >>> >>> At this point, I'm afraid the only option I see is to upgrade to at least >>> 3.12. >>> >>> @Sunil, do you have anything to add? >>> >>> Regards, >>> Nithya >>> >>> On 13 September 2018 at 18:34, Mauro Tridici <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Nithya, >>> >>> thank you for involving EC group. >>> I will wait for your suggestions. >>> >>> Regards, >>> Mauro >>> >>>> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran >>>> <[email protected] <mailto:[email protected]>> ha scritto: >>>> >>>> This looks like an issue because rebalance switched to using fallocate >>>> which EC did not have implemented at that point. >>>> >>>> @Pranith, @Ashish, which version of gluster had support for fallocate in >>>> EC? >>>> >>>> >>>> Regards, >>>> Nithya >>>> >>>> On 12 September 2018 at 19:24, Mauro Tridici <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Dear All, >>>> >>>> I recently added 3 servers (each one with 12 bricks) to an existing >>>> Gluster Distributed Disperse Volume. >>>> Volume extension has been completed without error and I already executed >>>> the rebalance procedure with fix-layout option with no problem. >>>> I just launched the rebalance procedure without fix-layout option, but, as >>>> you can see in the output below, I noticed that some failures have been >>>> detected. >>>> >>>> [root@s01 glusterfs]# gluster v rebalance tier2 status >>>> Node Rebalanced-files size >>>> scanned failures skipped status run time in >>>> h:m:s >>>> --------- ----------- ----------- >>>> ----------- ----------- ----------- ------------ >>>> -------------- >>>> localhost 71176 3.2MB >>>> 2137557 1530391 8128 in progress 13:59:05 >>>> s02-stg 0 0Bytes >>>> 0 0 0 completed 11:53:28 >>>> s03-stg 0 0Bytes >>>> 0 0 0 completed 11:53:32 >>>> s04-stg 0 0Bytes >>>> 0 0 0 completed 0:00:06 >>>> s05-stg 15 0Bytes >>>> 17055 0 18 completed 10:48:01 >>>> s06-stg 0 0Bytes >>>> 0 0 0 completed 0:00:06 >>>> Estimated time left for rebalance to complete : 0:46:53 >>>> volume rebalance: tier2: success >>>> >>>> In the volume rebalance log file, I detected a lot of error messages >>>> similar to the following ones: >>>> >>>> [2018-09-12 13:15:50.756703] E [MSGID: 0] >>>> [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on >>>> - tier2-disperse-6 for file - >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc >>>> <http://sps_200508_003.cam.h0.2005-12_grid.nc/> >>>> [2018-09-12 13:15:50.757025] E [MSGID: 109023] >>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>>> migrate-data failed for >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc >>>> <http://sps_200508_003.cam.h0.2005-12_grid.nc/> >>>> [2018-09-12 13:15:50.759183] E [MSGID: 109023] >>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: >>>> fallocate failed for >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc >>>> <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 >>>> (Operation not supported) >>>> [2018-09-12 13:15:50.759206] E [MSGID: 0] >>>> [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on >>>> - tier2-disperse-9 for file - >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc >>>> <http://sps_200508_003.cam.h0.2005-09_grid.nc/> >>>> [2018-09-12 13:15:50.759536] E [MSGID: 109023] >>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>>> migrate-data failed for >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc >>>> <http://sps_200508_003.cam.h0.2005-09_grid.nc/> >>>> [2018-09-12 13:15:50.777219] E [MSGID: 109023] >>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: >>>> fallocate failed for >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc >>>> <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 >>>> (Operation not supported) >>>> [2018-09-12 13:15:50.777241] E [MSGID: 0] >>>> [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on >>>> - tier2-disperse-10 for file - >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc >>>> <http://sps_200508_003.cam.h0.2006-01_grid.nc/> >>>> [2018-09-12 13:15:50.777676] E [MSGID: 109023] >>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: >>>> migrate-data failed for >>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc >>>> <http://sps_200508_003.cam.h0.2006-01_grid.nc/> >>>> >>>> Could you please help me to understand what is happening and how to solve >>>> it? >>>> >>>> Our Gluster implementation is based on Gluster v.3.10.5 >>>> >>>> Thank you in advance, >>>> Mauro >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> [email protected] <mailto:[email protected]> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> <https://lists.gluster.org/mailman/listinfo/gluster-users> >>>> >>> >>> >>> ------------------------- >>> Mauro Tridici >>> >>> Fondazione CMCC >>> CMCC Supercomputing Center >>> presso Complesso Ecotekne - Università del Salento - >>> Strada Prov.le Lecce - Monteroni sn >>> 73100 Lecce IT >>> http://www.cmcc.it <http://www.cmcc.it/> >>> >>> mobile: (+39) 327 5630841 >>> email: [email protected] <mailto:[email protected]> >>> >> >> >> ------------------------- >> Mauro Tridici >> >> Fondazione CMCC >> CMCC Supercomputing Center >> presso Complesso Ecotekne - Università del Salento - >> Strada Prov.le Lecce - Monteroni sn >> 73100 Lecce IT >> http://www.cmcc.it <http://www.cmcc.it/> >> >> mobile: (+39) 327 5630841 >> email: [email protected] <mailto:[email protected]> >> > > > ------------------------- > Mauro Tridici > > Fondazione CMCC > CMCC Supercomputing Center > presso Complesso Ecotekne - Università del Salento - > Strada Prov.le Lecce - Monteroni sn > 73100 Lecce IT > http://www.cmcc.it <http://www.cmcc.it/> > > mobile: (+39) 327 5630841 > email: [email protected] <mailto:[email protected]> > ------------------------- Mauro Tridici Fondazione CMCC CMCC Supercomputing Center presso Complesso Ecotekne - Università del Salento - Strada Prov.le Lecce - Monteroni sn 73100 Lecce IT http://www.cmcc.it mobile: (+39) 327 5630841 email: [email protected]
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
