Le 09/07/2012 19:14, Samuel Just a écrit :
Can you restart the node that failed to complete the upgrade with

Well, it's a little big complicated ; I now run those nodes with XFS, and I've long-running jobs on it right now, so I can't stop the ceph cluster at the moment.

As I've keeped the original broken btrfs volumes, I tried this morning to run the old osd in parrallel, using the $cluster variable. I only have partial success. I tried using different port for the mons, but ceph want to use the old mon map. I can edit it (epoch 1) but it seems to use 'latest' instead, the format isn't compatible with monmaptool and I don't know how to "inject" the modified on a non running cluster.

Anyway, osd seems to start fine, and I can reproduce the bug :
debug filestore = 20
debug osd = 20


I've put it in [global], is it sufficient ?


and post the log after an hour or so of running?  The upgrade process
might legitimately take a while.
-Sam
Only 15 minutes running, but ceph-osd is consumming lots of cpu, and a strace shows lots of pread.

Here is the log :

[..]
2012-07-10 11:33:29.560052 7f3e615ac780 0 filestore(/CEPH-PROD/data/osd.1) mount syncfs(2) syscall not support by glibc 2012-07-10 11:33:29.560062 7f3e615ac780 0 filestore(/CEPH-PROD/data/osd.1) mount no syncfs(2), but the btrfs SYNC ioctl will suffice 2012-07-10 11:33:29.560172 7f3e615ac780 -1 filestore(/CEPH-PROD/data/osd.1) FileStore::mount : stale version stamp detected: 2. Proceeding, do_update is set, performing disk format upgrade. 2012-07-10 11:33:29.560233 7f3e615ac780 0 filestore(/CEPH-PROD/data/osd.1) mount found snaps <3744666,3746725> 2012-07-10 11:33:29.560263 7f3e615ac780 10 filestore(/CEPH-PROD/data/osd.1) current/ seq was 3746725 2012-07-10 11:33:29.560267 7f3e615ac780 10 filestore(/CEPH-PROD/data/osd.1) most recent snap from <3744666,3746725> is 3746725 2012-07-10 11:33:29.560280 7f3e615ac780 10 filestore(/CEPH-PROD/data/osd.1) mount rolling back to consistent snap 3746725 2012-07-10 11:33:29.839281 7f3e615ac780 5 filestore(/CEPH-PROD/data/osd.1) mount op_seq is 3746725


... and nothing more.

I'll let him running for 3 hours. If I have another message, I'll let you know.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : yann.dup...@univ-nantes.fr

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to