Hi, before I get to my questions, I want to thank for the good work done with ceph. I learned about ceph in an Admin-Magazin article [1] and was supprised how easy it was to setup ceph by following the article. Trying new software and not hitting any error/warning or other problems is a very rare incident and I was verry impressed by the easy installation and configuration.
Later on I had some smaler problems as i tried to increase the number
of mon, ods an by adding an standby mds. But i managed to figure it
out using manpages and the web.
Now I have a problem that I don't know how to fix.
First some informations about my setup
ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
ceph.conf
------------
[global]
; Enable authentication between hosts within the cluster.
auth supported = cephx
keyring = /etc/ceph/$name.keyring
[mon]
mon data = /srv/mon.$id
[mds]
[osd]
osd data = /srv/osd.$id
osd journal = /srv/osd.$id.journal
osd journal size = 1000
[mon.b]
host = hpb020102
mon addr = 10.23.3.2:6789
[mon.c]
host = hpb020103
mon addr = 10.23.3.3:6789
[mon.d]
host = hpb020104
mon addr = 10.23.3.4:6789
[mon.e]
host = hpb020105
mon addr = 10.23.3.5:6789
[mon.f]
host = hpb020106
mon addr = 10.23.3.6:6789
[osd.2]
host = hpb020102
[osd.3]
host = hpb020103
[osd.4]
host = hpb020104
[osd.5]
host = hpb020105
[osd.6]
host = hpb020106
[mds.a]
host = hpb020104
[mds.b]
host = hpb020105
mds standby replay = true
------------
/srv/osd.* are on xfs partition
Befor my holiday I found logs that indicated that there might
be a problem with one of my mds which is still present
2013-01-14 15:32:41.943304 mds e515692: 1/1/1 up
{0=a=up:active}, 1 up:standby-replay, 5 up:oneshot-replay(laggy or
crashed)
I tried to increase the log-level and get some debug infos. After my holiday i found that the ceph-logs mostly the mon log had filled my / filesystem. First I thougth that the debugging was still active but at a closer look, I found that somehow the mon. key could not be found by mon.e2013-01-14 15:44:52.007632 7fad1e728700 0 mon.e@3(probing) e3 couldn't get secret for mon service 2013-01-14 15:44:52.007655 7fad17ee9700 0 mon.e@3(probing) e3 couldn't get secret for mon service 2013-01-14 15:44:52.007659 7fad1e728700 0 mon.e@3(probing) e3 no installed auth entries! 2013-01-14 15:44:52.007662 7fad17ee9700 0 mon.e@3(probing) e3 no installed auth entries! 2013-01-14 15:44:52.007860 7fad17ee9700 0 -- 10.23.3.5:6789/0 >> 10.23.3.3:6789/0 pipe(0x8e7190 sd=19 pgs=0 cs=0 l=0
).connect got BADAUTHORIZER2013-01-14 15:44:52.007860 7fad1e728700 0 -- 10.23.3.5:6789/0 >> 10.23.3.2:6789/0 pipe(0x8e6870 sd=18 pgs=0 cs=0 l=0
).connect got BADAUTHORIZER So i guess, by trying to get some more informations I somehow manged to delete the mon. key. I was unable the retieve the history because of the full filesystem. So I tried to use "ceph auth" and ceph-authtool to (re-)add the mon. key but only managed that mon.d is now too unable the authenticate. Sofar I know that I don't understand how cephx is working. "ceph auth list" shows the same key for mon. on all servers. But as it takes longer on hpb020104 and hpb020105 I guess it will contact some other mon servers as mon.d and mon.e are out of quorum. How can i get informations about the mon. key for mon.d and mon.e if they are not running / out of quorum? How can I add/change the mon. key? "/etc/ceph/" has keyrings for admin client.admin mds.* ods.* but none for mon. or mon.* Is this correct? Best regards Michael Menge[1] http://www.admin-magazin.de/Das-Heft/2012/03/Der-RADOS-Objectstore-und-Ceph-Teil-1/%28language%29/ger-DE
-------------------------------------------------------------------------------- M.Menge Tel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912Zentrum für Datenverarbeitung mail: [email protected]
Wächterstraße 76 72074 Tübingen
smime.p7s
Description: S/MIME Signatur
