Le 09/07/2012 19:43, Tommi Virtanen a écrit :
On Wed, Jul 4, 2012 at 1:06 AM, Yann Dupont <yann.dup...@univ-nantes.fr> wrote:
Well, I probably wasn't clear enough. I talked about crashed FS, but i was
talking about ceph. The underlying FS (btrfs in that case) of 1 node (and
only one) has PROBABLY crashed in the past, causing corruption in ceph data
on this node, and then the subsequent crash of other nodes.

RIGHT now btrfs on this node is OK. I can access the filesystem without
errors.
But the LevelDB isn't. It's contents got corrupted, somehow somewhere,
and it really is up to the LevelDB library to tolerate those errors;
we have a simple get/put interface we use, and LevelDB is triggering
an internal error.
Yes, understood.

One node had problem with btrfs, leading first to kernel problem , probably
corruption (in disk/ in memory maybe ?) ,and ultimately to a kernel oops.
Before that ultimate kernel oops, bad data has been transmitted to other
(sane) nodes, leading to ceph-osd crash on thoses nodes.
The LevelDB binary contents are not transferred over to other nodes;
Ok thanks for the clarification ;
this kind of corruption would not spread over the Ceph clustering
mechanisms. It's more likely that you have 4 independently corrupted
LevelDBs. Something in the workload Ceph runs makes that corruption
quite likely.
Very likely : since I reformatted my nodes with XFS I don't have problems so far.

The information here isn't enough to say whether the cause of the
corruption is btrfs or LevelDB, but the recovery needs to handled by
LevelDB -- and upstream is working on making it more robust:
http://code.google.com/p/leveldb/issues/detail?id=97
Yes, saw this. It's very important. Sometimes, s... happens. In respect to the size ceph volumes can reach, having a tool to restart damaged nodes (for whatever reason) is a must.

Thanks for the time you took to answer. It's much clearer for me now.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to