Hi,

As discussed during FOSDEM, the script you wrote to kill the OSD when it grows 
too much could be amended to core dump instead of just being killed & 
restarted. The binary + core could probably be used to figure out where the 
leak is.

You should make sure the OSD current working directory is in a file system with 
enough free disk space to accomodate for the dump and set

ulimit -c unlimited

before running it ( your system default is probably ulimit -c 0 which inhibits 
core dumps ). When you detect that OSD grows too much kill it with

kill -SEGV $pid

and upload the core found in the working directory, together with the binary in 
a public place. If the osd binary is compiled with -g but without changing the 
-O settings, you should have a larger binary file but no negative impact on 
performances. Forensics analysis will be made a lot easier with the debugging 
symbols. 

My 2cts

On 01/31/2013 08:57 PM, Sage Weil wrote:
> On Thu, 31 Jan 2013, Sylvain Munaut wrote:
>> Hi,
>>
>> I disabled scrubbing using
>>
>>> ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
>>> ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'
>>
>> and the leak seems to be gone.
>>
>> See the graph at  http://i.imgur.com/A0KmVot.png  with the OSD memory
>> for the 12 osd processes over the last 3.5 days.
>> Memory was rising every 24h. I did the change yesterday around 13h00
>> and OSDs stopped growing. OSD memory even seems to go down slowly by
>> small blocks.
>>
>> Of course I assume disabling scrubbing is not a long term solution and
>> I should re-enable it ... (how do I do that btw ? what were the
>> default values for those parameters)
> 
> It depends on the exact commit you're on.  You can see the defaults if you 
> do
> 
>  ceph-osd --show-config | grep osd_scrub
> 
> Thanks for testing this... I have a few other ideas to try to reproduce.  
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to