Re: [Gluster-users] Failures during rebalance on gluster distributed disperse volume

Mauro Tridici Mon, 17 Sep 2018 01:41:10 -0700

Hi Nithya,

thank you.
Rabalance terminated (with a lot of failures) a few days ago.
Due to some I/O errors during write operations on gluster volume, I had to 
re-launch fix-layout rebalance


Now, fix-layouts rebalance is completed and I can write data without I/O 
errors. 
As soon as possible I will start the upgrade procedure.

Thank you again for your support.
Regards,
Mauro

> Il giorno 16 set 2018, alle ore 06:07, Nithya Balachandran 
> <[email protected]> ha scritto:
> 
> Hi Mauro,
> 
> Please stop the rebalance before you upgrade.
> Thanks,
> Nithya
> 
> On 15 September 2018 at 22:55, Mauro Tridici <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Hi Sunil,
> 
> many thanks to you too.
> I will follow your suggestions and the guide for upgrading to 3.12
> 
> Crossing fingers :-)
> Regards,
> Mauro
> 
>> Il giorno 15 set 2018, alle ore 11:57, Sunil Kumar Heggodu Gopala Acharya 
>> <[email protected] <mailto:[email protected]>> ha scritto:
>> 
>> Hi Mauro,
>> 
>> As Nithya highlighted FALLOCATE support for EC volumes went in 3.11 as part 
>> of https://bugzilla.redhat.com/show_bug.cgi?id=1454686 
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1454686>. Hence, upgrading to 
>> 3.12 as suggested before would be a right move.
>> 
>> Here is the documentation for upgrading to 3.12: 
>> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/ 
>> <https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.12/>
>> 
>> Regards,
>> SUNIL KUMAR ACHARYA
>> SENIOR SOFTWARE ENGINEER
>> Red Hat 
>> 
>>  <https://www.redhat.com/>
>> T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/>   
>>  
>> 
>>   <https://red.ht/sig>
>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>> 
>> On Sat, Sep 15, 2018 at 3:42 AM, Mauro Tridici <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Nithya,
>> 
>> thank you very much for your answer.
>> I will wait for @Sunil opinion too before starting the upgrade procedure.
>> 
>> Since it will be the first upgrade of our Gluster cluster, I would like to 
>> know if it could be a “virtually dangerous" procedure and if it will be the 
>> risk of losing data :-) 
>> Unfortunately, I can’t do a preventive copy of the volume data in another 
>> location.
>> If it is possible, could you please illustrate the right steps needed to 
>> complete the upgrade procedure from the 3.10.5 to the 3.12 version?
>> 
>> Thank you again, Nithya.
>> Thank you to all of you for the help!
>> 
>> Regards,
>> Mauro
>> 
>>> Il giorno 14 set 2018, alle ore 16:59, Nithya Balachandran 
>>> <[email protected] <mailto:[email protected]>> ha scritto:
>>> 
>>> Hi Mauro,
>>> 
>>> 
>>> The rebalance code started using fallocate in 3.10.5 
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1473132 
>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1473132>) which works fine on 
>>> replicated volumes. However, we neglected to test this with EC volumes on 
>>> 3.10. Once we discovered the issue, the EC fallocate implementation was 
>>> made available in 3.11.
>>> 
>>> At this point, I'm afraid the only option I see is to upgrade to at least 
>>> 3.12.
>>> 
>>> @Sunil, do you have anything to add?
>>> 
>>> Regards,
>>> Nithya
>>> 
>>> On 13 September 2018 at 18:34, Mauro Tridici <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Nithya,
>>> 
>>> thank you for involving EC group.
>>> I will wait for your suggestions.
>>> 
>>> Regards,
>>> Mauro
>>> 
>>>> Il giorno 13 set 2018, alle ore 13:38, Nithya Balachandran 
>>>> <[email protected] <mailto:[email protected]>> ha scritto:
>>>> 
>>>> This looks like an issue because rebalance switched to using fallocate 
>>>> which EC did not have implemented at that point.
>>>> 
>>>> @Pranith, @Ashish, which version of gluster had support for fallocate in 
>>>> EC?
>>>> 
>>>> 
>>>> Regards,
>>>> Nithya
>>>> 
>>>> On 12 September 2018 at 19:24, Mauro Tridici <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Dear All,
>>>> 
>>>> I recently added 3 servers (each one with 12 bricks) to an existing 
>>>> Gluster Distributed Disperse Volume.
>>>> Volume extension has been completed without error and I already executed 
>>>> the rebalance procedure with fix-layout option with no problem.
>>>> I just launched the rebalance procedure without fix-layout option, but, as 
>>>> you can see in the output below, I noticed that some failures have been 
>>>> detected.
>>>> 
>>>> [root@s01 glusterfs]# gluster v rebalance tier2 status
>>>>                                     Node Rebalanced-files          size    
>>>>    scanned      failures       skipped               status  run time in 
>>>> h:m:s
>>>>                                ---------      -----------   -----------   
>>>> -----------   -----------   -----------         ------------     
>>>> --------------
>>>>                                localhost            71176         3.2MB    
>>>>    2137557       1530391          8128          in progress       13:59:05
>>>>                                  s02-stg                0        0Bytes    
>>>>          0             0             0            completed       11:53:28
>>>>                                  s03-stg                0        0Bytes    
>>>>          0             0             0            completed       11:53:32
>>>>                                  s04-stg                0        0Bytes    
>>>>          0             0             0            completed        0:00:06
>>>>                                  s05-stg               15        0Bytes    
>>>>      17055             0            18            completed       10:48:01
>>>>                                  s06-stg                0        0Bytes    
>>>>          0             0             0            completed        0:00:06
>>>> Estimated time left for rebalance to complete :        0:46:53
>>>> volume rebalance: tier2: success
>>>> 
>>>> In the volume rebalance log file, I detected a lot of error messages 
>>>> similar to the following ones:
>>>> 
>>>> [2018-09-12 13:15:50.756703] E [MSGID: 0] 
>>>> [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on 
>>>> - tier2-disperse-6 for file - 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
>>>> [2018-09-12 13:15:50.757025] E [MSGID: 109023] 
>>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: 
>>>> migrate-data failed for 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-12_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2005-12_grid.nc/>
>>>> [2018-09-12 13:15:50.759183] E [MSGID: 109023] 
>>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: 
>>>> fallocate failed for 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2005-09_grid.nc/> on tier2-disperse-9 
>>>> (Operation not supported)
>>>> [2018-09-12 13:15:50.759206] E [MSGID: 0] 
>>>> [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on 
>>>> - tier2-disperse-9 for file - 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
>>>> [2018-09-12 13:15:50.759536] E [MSGID: 109023] 
>>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: 
>>>> migrate-data failed for 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2005-09_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2005-09_grid.nc/>
>>>> [2018-09-12 13:15:50.777219] E [MSGID: 109023] 
>>>> [dht-rebalance.c:844:__dht_rebalance_create_dst_file] 0-tier2-dht: 
>>>> fallocate failed for 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2006-01_grid.nc/> on tier2-disperse-10 
>>>> (Operation not supported)
>>>> [2018-09-12 13:15:50.777241] E [MSGID: 0] 
>>>> [dht-rebalance.c:1696:dht_migrate_file] 0-tier2-dht: Create dst failed on 
>>>> - tier2-disperse-10 for file - 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
>>>> [2018-09-12 13:15:50.777676] E [MSGID: 109023] 
>>>> [dht-rebalance.c:2733:gf_defrag_migrate_single_file] 0-tier2-dht: 
>>>> migrate-data failed for 
>>>> /CSP/sp1/CESM/archive/sps_200508_003/atm/hist/postproc/sps_200508_003.cam.h0.2006-01_grid.nc
>>>>  <http://sps_200508_003.cam.h0.2006-01_grid.nc/>
>>>> 
>>>> Could you please help me to understand what is happening and how to solve 
>>>> it?
>>>> 
>>>> Our Gluster implementation is based on Gluster v.3.10.5
>>>> 
>>>> Thank you in advance,
>>>> Mauro
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users 
>>>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
>>>> 
>>> 
>>> 
>>> -------------------------
>>> Mauro Tridici
>>> 
>>> Fondazione CMCC
>>> CMCC Supercomputing Center
>>> presso Complesso Ecotekne - Università del Salento -
>>> Strada Prov.le Lecce - Monteroni sn
>>> 73100 Lecce  IT
>>> http://www.cmcc.it <http://www.cmcc.it/>
>>> 
>>> mobile: (+39) 327 5630841
>>> email: [email protected] <mailto:[email protected]>
>>> 
>> 
>> 
>> -------------------------
>> Mauro Tridici
>> 
>> Fondazione CMCC
>> CMCC Supercomputing Center
>> presso Complesso Ecotekne - Università del Salento -
>> Strada Prov.le Lecce - Monteroni sn
>> 73100 Lecce  IT
>> http://www.cmcc.it <http://www.cmcc.it/>
>> 
>> mobile: (+39) 327 5630841
>> email: [email protected] <mailto:[email protected]>
>> 
> 
> 
> -------------------------
> Mauro Tridici
> 
> Fondazione CMCC
> CMCC Supercomputing Center
> presso Complesso Ecotekne - Università del Salento -
> Strada Prov.le Lecce - Monteroni sn
> 73100 Lecce  IT
> http://www.cmcc.it <http://www.cmcc.it/>
> 
> mobile: (+39) 327 5630841
> email: [email protected] <mailto:[email protected]>
> 


-------------------------
Mauro Tridici

Fondazione CMCC
CMCC Supercomputing Center
presso Complesso Ecotekne - Università del Salento -
Strada Prov.le Lecce - Monteroni sn
73100 Lecce  IT
http://www.cmcc.it

mobile: (+39) 327 5630841
email: [email protected]

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failures during rebalance on gluster distributed disperse volume

Reply via email to