Re: domino-style OSD crash

Tommi Virtanen Mon, 09 Jul 2012 10:44:02 -0700

On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont <yann.dup...@univ-nantes.fr> wrote:
> Well, I probably wasn't clear enough. I talked about crashed FS, but i was
> talking about ceph. The underlying FS (btrfs in that case) of 1 node (and
> only one) has PROBABLY crashed in the past, causing corruption in ceph data
> on this node, and then the subsequent crash of other nodes.
>
> RIGHT now btrfs on this node is OK. I can access the filesystem without
> errors.


But the LevelDB isn't. It's contents got corrupted, somehow somewhere,
and it really is up to the LevelDB library to tolerate those errors;
we have a simple get/put interface we use, and LevelDB is triggering
an internal error.

> One node had problem with btrfs, leading first to kernel problem , probably
> corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops.
> Before that ultimate kernel oops, bad data has been transmitted to other
> (sane) nodes, leading to ceph-osd crash on thoses nodes.

The LevelDB binary contents are not transferred over to other nodes;
this kind of corruption would not spread over the Ceph clustering
mechanisms. It's more likely that you have 4 independently corrupted
LevelDBs. Something in the workload Ceph runs makes that corruption
quite likely.

The information here isn't enough to say whether the cause of the
corruption is btrfs or LevelDB, but the recovery needs to handled by
LevelDB -- and upstream is working on making it more robust:
http://code.google.com/p/leveldb/issues/detail?id=97
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: domino-style OSD crash

Reply via email to