Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

Dan Van Der Ster Wed, 17 Sep 2014 08:25:30 -0700

Hi Florian,

> On 17 Sep 2014, at 17:09, Florian Haas <flor...@hastexo.com> wrote:
> 
> Hi Craig,
> 
> just dug this up in the list archives.
> 
> On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis <cle...@centraldesktop.com> 
> wrote:
>> In the interest of removing variables, I removed all snapshots on all pools,
>> then restarted all ceph daemons at the same time.  This brought up osd.8 as
>> well.
> 
> So just to summarize this: your 100% CPU problem at the time went away
> after you removed all snapshots, and the actual cause of the issue was
> never found?
> 
> I am seeing a similar issue now, and have filed
> http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
> again. Can you take a look at that issue and let me know if anything
> in the description sounds familiar?



Could your ticket be related to the snap trimming issue I’ve finally narrowed 
down in the past couple days?

  http://tracker.ceph.com/issues/9487

Bump up debug_osd to 20 then check the log during one of your incidents. If it 
is busy logging the snap_trimmer messages, then it’s the same issue. (The issue 
is that rbd pools have many purged_snaps, but sometimes after backfilling a PG 
the purged_snaps list is lost and thus the snap trimmer becomes very busy 
whilst re-trimming thousands of snaps. During that time (a few minutes on my 
cluster) the OSD is blocked.)

Cheers, Dan



> 
> You mentioned in a later message in the same thread that you would
> keep your snapshot script running and "repeat the experiment". Did the
> situation change in any way after that? Did the issue come back? Or
> did you just stop using snapshots altogether?
> 
> Cheers,
> Florian
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

Reply via email to