On 10/22/2014 07:41 PM, Andrey Korolyov wrote:
Hello,
given small test cluster, following sequence resulted to the inability
to join back for freshly formatted OSD:
- update cluster sequentially from cuttlefish to dumpling to firefly,
- execute tunables change, wait for recovery completion,
- shut down single osd, reformat filestore and journal,
- start it back (auth caps and key remained the same).
Version is 5a10b95f7968ecac1f2af4abf9fb91347a290544. Any ideas why
this may happen are very welcomed. I suspect some resource starting
from 29499 (probably earlier but this line doing a clear separation
between init stage and loop in the log) line in strace which is
continuously asking for resource all way down may be a root cause
(something just after journal and collections initialization) but I
have no idea what it may be.
Thanks!
Strace http://xdel.ru/downloads/osd0.out.gz
Can you send us your ceph.conf (edit away any sensitive information you
may have), the log for the osd you are having trouble with (with 'debug
monc = 10' and 'debug ms = 1'), and the log for your monitors (with
'debug mon = 10', 'debug ms = 1')?
-Joao
--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com