On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz
<franc...@ctrlaltdel.ch> wrote:
> Hi Greg,
>
> Thanks for your support!
>
> On 08. 09. 14 20:20, Gregory Farnum wrote:
>
>> The first one is not caused by the same thing as the ticket you
>> reference (it was fixed well before emperor), so it appears to be some
>> kind of disk corruption.
>> The second one is definitely corruption of some kind as it's missing
>> an OSDMap it thinks it should have. It's possible that you're running
>> into bugs in emperor that were fixed after we stopped doing regular
>> support releases of it, but I'm more concerned that you've got disk
>> corruption in the stores. What kind of crashes did you see previously;
>> are there any relevant messages in dmesg, etc?
>
> Nothing special in dmesg except probably irrelevant XFS warnings:
>
> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)

Hmm, I'm not sure what the outcome of that could be. Googling for the
error message returns this as the first result, though:
http://comments.gmane.org/gmane.comp.file-systems.xfs.general/58429
Which indicates that it's a real deadlock and capable of messing up
your OSDs pretty good.

>
> All logs from before the disaster are still there, do you have any
> advise on what would be relevant?
>
>> Given these issues, you might be best off identifying exactly which
>> PGs are missing, carefully copying them to working OSDs (use the osd
>> store tool), and killing these OSDs. Do lots of backups at each
>> stage...
>
> This sounds scary, I'll keep fingers crossed and will do a bunch of
> backups. There are 17 pg with missing objects.
>
> What do you exactly mean by the osd store tool? Is it the
> 'ceph_filestore_tool' binary?

Yeah, that one.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to