Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online

Götz Reinicke Sun, 27 Jan 2019 06:49:08 -0800

Dear all,

thanks for your feedback and Fäll try to take any suggestion in consideration.


I’v rebooted node in question and oll 24 OSDs came online without any 
complaining.

But wat makes me wonder is: During the downtime the Object got rebalanced and 
placed on the remaining nodes.

With the failed node online, only a couple of hundreds objects where misplaced, 
out of about 35 million.

The question for me is: What happens to the objects on the OSDs that went down 
after the OSDs got back online?

        Thanks for feedback 


> Am 27.01.2019 um 04:17 schrieb Christian Balzer <[email protected]>:
> 
> 
> Hello,
> 
> this is where (depending on your topology) something like:
> ---
> mon_osd_down_out_subtree_limit = host
> ---
> can come in very handy.
> 
> Provided you have correct monitoring, alerting and operations, recovering
> a down node can often be restored long before any recovery would be
> finished and you also avoid the data movement back and forth.
> And if you see that recovering the node will take a long time, just
> manually set things out for the time being.
> 
> Christian
> 
> On Sun, 27 Jan 2019 00:02:54 +0100 Götz Reinicke wrote:
> 
>> Dear Chris,
>> 
>> Thanks for your feedback. The node/OSDs in question are part of an erasure 
>> coded pool and during the weekend the workload should be close to none.
>> 
>> But anyway, I could get a look on the console and on the server; the power 
>> is up, but I cant use any console, the Loginprompt is shown, but no key is 
>> accepted.
>> 
>> I’ll have to reboot the server and check what he is complaining about 
>> tomorrow morning ASAP I can access the server again.
>> 
>>      Fingers crossed and regards. Götz
>> 
>> 
>> 
>>> Am 26.01.2019 um 23:41 schrieb Chris <[email protected]>:
>>> 
>>> It sort of depends on your workload/use case.  Recovery operations can be 
>>> computationally expensive.  If your load is light because its the weekend 
>>> you should be able to turn that host back on  as soon as you resolve 
>>> whatever the issue is with minimal impact.  You can also increase the 
>>> priority of the recovery operation to make it go faster if you feel you can 
>>> spare additional IO and it won't affect clients.
>>> 
>>> We do this in our cluster regularly and have yet to see an issue (given 
>>> that we take care to do it during periods of lower client io)
>>> 
>>> On January 26, 2019 17:16:38 Götz Reinicke <[email protected]> 
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> one host out of 10 is down for yet unknown reasons. I guess a power 
>>>> failure. I could not yet see the server.
>>>> 
>>>> The Cluster is recovering and remapping fine, but still has some objects 
>>>> to process.
>>>> 
>>>> My question: May I just switch the server back on and in best case, the 24 
>>>> OSDs get back online and recovering will do the job without problems.
>>>> 
>>>> Or what might be a good way to handle that host? Should I first wait till 
>>>> the recover is finished?
>>>> 
>>>> Thanks for feedback and suggestions - Happy Saturday Night  :) . Regards . 
>>>> Götz  
>> 
> 
> 
> -- 
> Christian Balzer        Network/Systems Engineer                
> [email protected] <mailto:[email protected]>          Rakuten Communications

          <http://www.filmakademie.de/>       
Götz Reinicke 
IT-Koordinator
IT-OfficeNet
+49 7141 969 82420 
[email protected] <mailto:[email protected]>
Filmakademie Baden-Württemberg GmbH 
Akademiehof 10
71638 Ludwigsburg 
http://www.filmakademie.de <http://www.filmakademie.de/>
       <https://twitter.com/filmakademiebw>                    
<https://de-de.facebook.com/filmakademiebw>                     
<https://www.youtube.com/user/TheFilmakademie>                  
<https://vimeo.com/filmakademiebw>     
Eintragung Amtsgericht Stuttgart HRB 205016
Vorsitzende des Aufsichtsrates:
Petra Olschowski
Staatssekretärin im Ministerium für Wissenschaft,
Forschung und Kunst Baden-Württemberg
Geschäftsführer:
Prof. Thomas Schadt

Datenschutzerklärung 
<https://www.filmakademie.de/de/meta/datenschutz/datenschutzerklaerung/> | 
Transparenzinformation 
<https://www.filmakademie.de/de/meta/datenschutz/transparenzinformationen/>
Data privacy statement 
<https://www.filmakademie.de/en/meta/data-privacy/data-privacy-statement/> | 
Transparency information 
<https://www.filmakademie.de/en/meta/data-privacy/transparency-information/>

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] One host with 24 OSDs is offline - best way to get it back online

Reply via email to