So our datacenter lost power and 2/3 of our monitors died with FS
corruption. I tried fixing it but it looks like the store.db didn't make
it.

I copied the working journal via



   1.

   sudo mv /var/lib/ceph/mon/ceph-$(hostname){,.BAK}

   2.

   sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename}
--keyring {tmp}/{key-filename}



   1.

   ceph-mon -i `hostname` --extract-monmap /tmp/monmap

   2.

   ceph-mon -i {mon-id} --inject-monmap {map-path}


and for a brief moment i had a quorum but any ceph cli commands would
result in cephx errors. Now the two failed monitors have elected a quorum
and the monitor that was working keeps getting kicked out of the cluster::


 '''
{
    "election_epoch": 402,
    "quorum": [
        0,
        1
    ],
    "quorum_names": [
        "kh11-8",
        "kh12-8"
    ],
    "quorum_leader_name": "kh11-8",
    "monmap": {
        "epoch": 1,
        "fsid": "a6ae50db-5c71-4ef8-885e-8137c7793da8",
        "modified": "0.000000",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "kh11-8",
                "addr": "10.64.64.134:6789\/0"
            },
            {
                "rank": 1,
                "name": "kh12-8",
                "addr": "10.64.64.143:6789\/0"
            },
            {
                "rank": 2,
                "name": "kh13-8",
                "addr": "10.64.64.151:6789\/0"
            }
        ]
    }
}
'''

At this point I am not sure what to do as any ceph commands return cephx
errors and I can't seem to verify if the new "quorum" is actually valid

any way to regenerate a cephx authentication key or recover it with
hardware access to the nodes or any advice on how to recover from what
seems to be complete monitor failure?


-- 
- Sean:  I wrote this. -
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to