Re: [ceph-users] Adding OSD sometimes suspends cluster

Erdem Agaoglu Mon, 01 Apr 2013 01:19:02 -0700

In addition, i was able to extract some logs from the last time
active/peering problem happened.
http://pastebin.com/BakFREFP
It ends with me restarting it.



On Mon, Apr 1, 2013 at 10:23 AM, Erdem Agaoglu <[email protected]>wrote:

> Hi all,
>
> We are currently in process of enlarging our bobtail cluster size by
> adding OSDs. We have 12 disks per machine and we are creating one OSD per
> disk, adding them one by one as recommended. Only thing we don't do is
> starting with a small weight and increasing it slowly. Weights are all 1.
>
> In this scenario both rbd and radosgw are unable to respond only in the
> first two minutes of adding a new OSD. After that small hiccup, we have
> some pgs like active+remapped+wait_backfill, active+remapped+backfilling,
> active+recovery_wait+remapped, active+degraded+remapped+backfilling and
> everything works OK. After a few hours of backfilling and recovery all pgs
> come active+clean and we add another OSD.
>
> But sometimes, that small hiccup takes longer than a few minutes. In that
> times status shows some pgs are stuck in active and some are stuck in
> peering. When we look at the pg dump we see all those active or peering pgs
> are on the same 2 OSDs and are unable to move forward. At this stage rbd
> works poorly and radosgw is completely stalled. Only after restarting one
> of those 2 OSDs, pg's start to backfill and clients continue with their
> operations.
>
> Since this is a live cluster we don't want to wait too long and usually go
> restart the OSD in a hurry. That's why i cannot currently provide status or
> pg query outputs. We have some logs but i don't know what to look for or if
> they are verbose enough.
>
> Can this be any kind of a known issue? If not, where should i look to get
> any ideas about what's happening when it occurs?
>
> Thanks in advance
>
> --
> erdem agaoglu
>



-- 
erdem agaoglu

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Adding OSD sometimes suspends cluster

Reply via email to