Re: [ceph-users] Some OSD and MDS crash

Pierre BLONDEAU Thu, 17 Jul 2014 09:32:47 -0700

Hi

0 Brilliant !!!! I recovered my data.

1 Gregory, Joao, John, Samuel : Thank a lot for all the help and to have responded at each time.

2 It's my fault, if i am pass to 0.82. And it's good, if that helped you to find some bugs ;)

3 With this fear, we will recreate our cluster in firefly. I think we'll change our infrastructure mode to NFS over RDB. But the transfer time risks to be long and we will lost the advantage of the performances of CephFS for computation.

There is a better technique than rsync to transfer data between CephFS and RDB ?

For the moment, I have restart only one MDS. Can I restart other MDSs or it's dangerous ?


I always have two pages in inconsistence mode, how can i solved that ?

Regards,
Pierre

PS : the action :

# cephfs-journal-tool journal reset
old journal was 25356971019142~18446718717516473070
new journal start will be 780140544 (2199948 bytes past old end)
writing journal head
writing EResetJournal entry
done

Le 17/07/2014 15:38, John Spray a écrit :

Hi Pierre,

Unfortunately it looks like we had a bug in 0.82 that could lead to
journal corruption of the sort you're seeing here.  A new journal
format was added, and on the first start after an update the MDS would
re-write the journal to the new format.  This should only have been
happening on the single active MDS for a given rank, but it was
actually being done by standby-replay MDS daemons too.  As a result,
if there were standby-replay daemons configured, they could try to
rewrite the journal at the same time, resulting in a corrupt journal.

In your case, I think the probability of the condition occurring was
increased by the OSD issues you were having, because at some earlier
stage the rewrite process had been stopped partway through.  Without
standby MDSs this would be recovered from cleanly, but with the
standbys in play the danger of corruption is high while the journal is
in the partly-rewritten state.

The ticket is here: http://tracker.ceph.com/issues/8811
The candidate fix is here: https://github.com/ceph/ceph/pull/2115

If you have recent backups then I would suggest recreating the
filesystem and restoring from backups.  You can also try using the
"cephfs-journal-tool journal reset" command, which will wipe out the
journal entirely, losing the most recent writes to the filesystem and
potentially leaving some stray objects in the data pool.

Sorry that this has bitten you, even though 0.82 was not a named
release this was a pretty nasty bug to let out there, and I'm going to
improve our automated tests in this area.

Regards,
John


On Wed, Jul 16, 2014 at 11:57 PM, Pierre BLONDEAU
<[email protected]> wrote:

Le 16/07/2014 22:40, Gregory Farnum a écrit :

On Wed, Jul 16, 2014 at 6:21 AM, Pierre BLONDEAU
<[email protected]> wrote:


Hi,

After the repair process, i have :
1926 active+clean
     2 active+clean+inconsistent

This two PGs seem to be on the same osd ( #34 ):
# ceph pg dump | grep inconsistent
dumped all in format plain
0.2e    4       0       0       0       8388660 4       4
active+clean+inconsistent       2014-07-16 11:39:43.819631      9463'4
438411:133968   [34,4]  34      [34,4]  34      9463'4  2014-07-16
04:52:54.417333      9463'4  2014-07-11 09:29:22.041717
0.1ed   5       0       0       0       8388623 10      10
active+clean+inconsistent       2014-07-16 11:39:45.820142      9712'10
438411:144792   [34,2]  34      [34,2]  34      9712'10 2014-07-16
09:12:44.742488      9712'10 2014-07-10 21:57:11.345241

It's can explain why my MDS won't to start ? If i remove ( or shutdown )
this OSD, it's can solved my problem ?



You want to figure out why they're inconsistent (if they're still
going inconsistent, or maybe just need to be repaired), but this
shouldn't be causing your MDS troubles.
Can you dump the MDS journal and put it somewhere accessible? (You can
use ceph-post-file to upload it.) John has been trying to reproduce
this crash but hasn't succeeded yet.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



Hi,

I try to do :
cephfs-journal-tool journal export ceph-journal.bin 2>
cephfs-journal-tool.log

But the program crash. I upload log file :
e069c6ac-3cb4-4a52-8950-da7c600e2b01

There is a mistake in
http://ceph.com/docs/master/cephfs/cephfs-journal-tool/ in "Example: journal
inspect". The good syntax seems to be :
# cephfs-journal-tool  journal inspect
2014-07-17 00:54:14.155382 7ff89d239780 -1 Header is invalid (inconsistent
offsets)
Overall journal integrity: DAMAGED
Header could not be decoded

Regards


--
----------------------------------------------
Pierre BLONDEAU
Administrateur Systèmes & réseaux
Université de Caen
Laboratoire GREYC, Département d'informatique

tel     : 02 31 56 75 42
bureau  : Campus 2, Science 3, 406
----------------------------------------------



--
----------------------------------------------
Pierre BLONDEAU
Administrateur Systèmes & réseaux
Université de Caen
Laboratoire GREYC, Département d'informatique

tel     : 02 31 56 75 42
bureau  : Campus 2, Science 3, 406
----------------------------------------------

smime.p7s
Description: Signature cryptographique S/MIME

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Some OSD and MDS crash

Reply via email to