RE: Osd load and timeout when recovering

Yann ROBIN Mon, 22 Oct 2012 07:00:10 -0700

>>
>> After looking at the osd, we saw a very high load on osd (450 of load), some 
>> were down.
>> Ceph -s displayed that we were having down pg, peering+down pg, remapped pg. 
>> etc.
>>
>
>Could you tell us a bit more?
>
When the load was 450, was this mainly due to disk I/O wait?
Did the machines start to swap?


All disk were 100% busy. And server was swapping.

> Could it be that the swapping was actually causing the machines to die even 
> more?

> Although a OSD could run with 100M of memory, during recovery it can grow 
> quite fast.

Is there a way to estimate the needed memory ?

>
> So basically the cluster was under load because we was recovering... but 
> because it was under load recovering could not complete.
>
>
>FileStore aborts indicate that it couldn't get the work done quickly enough. 
>I've seen this with btrfs, but you say you are using XFS.
>
>You say you are storing small files. What exactly is "small"?

In average 120ko.


-- 
Yann ROBIN
www.YouScribe.com



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Osd load and timeout when recovering

Reply via email to