Re: domino-style OSD crash

Yann Dupont Tue, 10 Jul 2012 02:46:43 -0700

Le 09/07/2012 19:14, Samuel Just a écrit :

Can you restart the node that failed to complete the upgrade with

Well, it's a little big complicated ; I now run those nodes with XFS,and I've long-running jobs on it right now, so I can't stop the cephcluster at the moment.

As I've keeped the original broken btrfs volumes, I tried this morningto run the old osd in parrallel, using the $cluster variable. I onlyhave partial success.I tried using different port for the mons, but ceph want to use the oldmon map. I can edit it (epoch 1) but it seems to use 'latest' instead,the format isn't compatible with monmaptool and I don't know how to"inject" the modified on a non running cluster.


Anyway, osd seems to start fine, and I can reproduce the bug :

debug filestore = 20
debug osd = 20


I've put it in [global], is it sufficient ?


and post the log after an hour or so of running?  The upgrade process
might legitimately take a while.
-Sam

Only 15 minutes running, but ceph-osd is consumming lots of cpu, and astrace shows lots of pread.


Here is the log :

[..]

2012-07-10 11:33:29.560052 7f3e615ac780 0filestore(/CEPH-PROD/data/osd.1) mount syncfs(2) syscall not support byglibc2012-07-10 11:33:29.560062 7f3e615ac780 0filestore(/CEPH-PROD/data/osd.1) mount no syncfs(2), but the btrfs SYNCioctl will suffice2012-07-10 11:33:29.560172 7f3e615ac780 -1filestore(/CEPH-PROD/data/osd.1) FileStore::mount : stale version stampdetected: 2. Proceeding, do_update is set, performing disk format upgrade.2012-07-10 11:33:29.560233 7f3e615ac780 0filestore(/CEPH-PROD/data/osd.1) mount found snaps <3744666,3746725>2012-07-10 11:33:29.560263 7f3e615ac780 10filestore(/CEPH-PROD/data/osd.1) current/ seq was 37467252012-07-10 11:33:29.560267 7f3e615ac780 10filestore(/CEPH-PROD/data/osd.1) most recent snap from<3744666,3746725> is 37467252012-07-10 11:33:29.560280 7f3e615ac780 10filestore(/CEPH-PROD/data/osd.1) mount rolling back to consistent snap37467252012-07-10 11:33:29.839281 7f3e615ac780 5filestore(/CEPH-PROD/data/osd.1) mount op_seq is 3746725



... and nothing more.

I'll let him running for 3 hours. If I have another message, I'll letyou know.


Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: domino-style OSD crash

Reply via email to