My problem was that commands like ceph -s fail to connect

and therefore I couldn't extract monmap.

I could get it from the running pid though and I 've used it
along with the documentation and the example of how a monmap
looks like in order to create a new and inject it into the
second monitor.

I belive that this was the action that solved my problems.....not quiet confident though :-(


Thanks a lot to everyone that spend some time to deal with my problem!

All the best,

George

On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
Sage,

correct me if I am wrong but this is when you have a surviving monitor !
right?

Yes.  By surviving I mean that the mon data directory has not been
deleted.

My problem is that I cannot extract the monmap from any!

Do you mean that the ceph -s or ceph health commands fail to connect (the
monitors cannot form quorum) or do you mean that when you follow the
instructinos on that link and run the 'ceph-mon --extract-monmap ...'
command (NOT 'ceph mon getmap ...') you get some error? If so, please
paste the output!

I have a supicion though we're just using different terms. The original monitor's data is probably just fine, but something went wrong with the configuration and it can't form a quorum with the one you tried to add, so all of the commands are failing. If so that's precisely the situation the
linked procedure will correct...

sage

 > > Best,

George

> On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
> > Not a healthy monitor means that I can not get a monmap from none of them!
>
> If you look at the procedure at
>
>
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
>
> you'll notice that you do not need any running monitors--it extracts the > monmap from the data directory. This procedure should let you remove all
> trace of the new monitor so that the original works as before.
>
> sage
>
>
> > and none of the commands ceph health etc. are working.
>
> >
> > Best,
> >
> > George
> >
> > > Yes Sage!
> > >
> > > Priority is to fix things!
> > >
> > > Right now I don't have a healthy monitor!
> > >
> > > Can I remove all of them and add the first one from scratch?
> > >
> > > What would that mean about the data??
> > >
> > > Best,
> > >
> > > George
> > >
> > > > On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
> > > > > This is the message that is flooding the ceph-mon.log now:
> > > > >
> > > > >
> > > > >  2015-03-14 08:16:39.286823 7f9f6920b700  1
> > > > >  mon.fu@0(electing).elector(1) init, last seen epoch 1
> > > > > 2015-03-14 08:16:42.736674 7f9f6880a700 1 mon.fu@0(electing) e2
> > > > >  adding peer 15.12.6.21:6789/0 to list of hints
> > > > >  2015-03-14 08:16:42.737891 7f9f6880a700  1
> > > > > mon.fu@0(electing).elector(1) discarding election message:
> > > > >  15.12.6.21:6789/0
> > > > >  not in my monmap e2: 2 mons at
> > > > >  {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}
> > > >
> > > > It sounds like you need to follow some variation of this procedure:
> > > >
> > > >
> > > >
> > > >
> > http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
> > > >
> > > > ..although it may be that simply killing the daemon running on
> > 15.12.6.21
> > > > and restarting the other mon daemons will be enough. If not, the > > > > procedure linked above will let tyou remove all traces of it and get
> > > > things up again.
> > > >
> > > > Not quite sure where things went awry but I assume the priority is to
> > get
> > > > things working first and figure that out later!
> > > >
> > > > sage
> > > >
> > > > >
> > > > >
> > > > >
> > > > >  George
> > > > >
> > > > >
> > > > > > This is the log for monitor (ceph-mon.log) when I try to restart
> > the
> > > > > > monitor:
> > > > > >
> > > > > >
> > > > > > 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2
> > ***
> > > > > > Got Signal Terminated ***
> > > > > > 2015-03-14 07:47:26.384593 7f1f1dc0f700 1 mon.fu@0(probing) e2
> > > > > > shutdown
> > > > > > 2015-03-14 07:47:26.384654 7f1f1dc0f700 0 quorum service shutdown
> > > > > > 2015-03-14 07:47:26.384657 7f1f1dc0f700  0
> > > > > > mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1
> > > > > > services
> > > > > > 2015-03-14 07:47:26.384665 7f1f1dc0f700 0 quorum service shutdown > > > > > > 2015-03-14 07:47:27.620670 7fc04b4437a0 0 ceph version 0.80.9 > > > > > > (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid
> > > > > > 17050
> > > > > > 2015-03-14 07:47:27.703151 7fc04b4437a0 0 starting mon.fu rank 0
> > at
> > > > > > 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
> > > > > > a1132ec2-7104-4e8e-a3d5-95965cae9138
> > > > > > 2015-03-14 07:47:27.703421 7fc04b4437a0 1 mon.fu@-1(probing) e2
> > > > > > preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
> > > > > > 2015-03-14 07:47:27.704504 7fc04b4437a0  1
> > > > > > mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh
> > > > > > upgraded, format 0 -> 1
> > > > > > 2015-03-14 07:47:27.704525 7fc04b4437a0 1 mon.fu@-1(probing).pg
> > v0
> > > > > > on_upgrade discarding in-core PGMap
> > > > > > 2015-03-14 07:47:27.837060 7fc04b4437a0 0 mon.fu@-1(probing).mds
> > > > > > e104 print_map
> > > > > > epoch     104
> > > > > > flags     0
> > > > > > created   2014-11-30 01:58:17.176938
> > > > > > modified  2015-03-14 06:07:05.683239
> > > > > > tableserver       0
> > > > > > root      0
> > > > > > session_timeout   60
> > > > > > session_autoclose 300
> > > > > > max_file_size     1099511627776
> > > > > > last_failure      0
> > > > > > last_failure_osd_epoch    1760
> > > > > > compat compat={},rocompat={},incompat={1=base v0.20,2=client > > > > > > writeable ranges,3=default file layouts on dirs,4=dir inode in > > > > > > separate object,5=mds uses versioned encoding,6=dirfrag is stored
> > in
> > > > > > omap}
> > > > > > max_mds   1
> > > > > > in        0
> > > > > > up        {0=59315}
> > > > > > failed
> > > > > > stopped
> > > > > > data_pools        3
> > > > > > metadata_pool     4
> > > > > > inline_data       disabled
> > > > > > 59315: 15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9
> > > > > >
> > > > > > 2015-03-14 07:47:27.837972 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > > 2015-03-14 07:47:27.837990 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > > 2015-03-14 07:47:27.837996 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > > 2015-03-14 07:47:27.838003 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > > e1768 crush map has features 1107558400, adjusting msgr requires
> > > > > > 2015-03-14 07:47:27.839054 7fc04b4437a0  1
> > > > > > mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh upgraded,
> > > > > > format 0 -> 1
> > > > > > 2015-03-14 07:47:27.840052 7fc04b4437a0 0 mon.fu@-1(probing) e2
> > my
> > > > > > rank is now 0 (was -1)
> > > > > > 2015-03-14 07:47:27.840512 7fc045ef5700 0 -- 192.168.1.100:6789/0
> > >>
> > > > > > 192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0
> > > > > > c=0x38c0dc0).fault
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >> I can no longer start my OSDs :-@
> > > > > >>
> > > > > >>
> > > > > >> failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
> > > > > >> --name=osd.6
> > > > > >> --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush
> > create-or-move
> > > > > >> --
> > > > > >> 6 3.63 host=fu root=default'
> > > > > >>
> > > > > >>
> > > > > >> Please help!!!
> > > > > >>
> > > > > >> George
> > > > > >>
> > > > > >>> ceph mon add stops at this:
> > > > > >>>
> > > > > >>>
> > > > > >>> [jin][INFO  ] Running command: sudo ceph mon getmap -o
> > > > > >>> /var/lib/ceph/tmp/ceph.raijin.monmap
> > > > > >>>
> > > > > >>>
> > > > > >>> and never gets over it!!!!!
> > > > > >>>
> > > > > >>>
> > > > > >>> Any help??
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>>
> > > > > >>>
> > > > > >>> George
> > > > > >>>
> > > > > >>>> Guyn any help much appreciated because my cluster is down :-(
> > > > > >>>>
> > > > > >>>> After trying ceph mon add which didn't complete since it was
> > stuck
> > > > > >>>> for ever here:
> > > > > >>>>
> > > > > >>>> [jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700 0
> > > > > >>>> monclient:
> > > > > >>>> hunting for new mon
> > > > > >>>> ^CKilled by signal 2.
> > > > > >>>> [ceph_deploy][ERROR ] KeyboardInterrupt
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> the previously healthy node is now down completely :-(
> > > > > >>>>
> > > > > >>>> $ ceph mon stat
> > > > > >>>> 2015-03-14 07:21:37.782360 7ff2545b1700  0 --
> > > > > >>>> 192.168.1.100:0/1042061
> > > > > >>>> >> 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0
> > cs=0
> > > > > >>>> l=1
> > > > > >>>> c=0x7ff248000e90).fault
> > > > > >>>> ^CError connecting to cluster: InterruptedOrTimeoutError
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Any ideas??
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> All the best,
> > > > > >>>>
> > > > > >>>> George
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>> Georgeos
> > > > > >>>>>
> > > > > >>>>> , you need to have "deployment server" and cd into folder that
> > > > > >>>>> you
> > > > > >>>>> used originaly while deploying CEPH - in this folder you
> > should
> > > > > >>>>> already have ceph.conf, admin.client keyring and other stuff -
> > > > > >>>>> which
> > > > > >>>>> is required to to connect to cluster...and provision new MONs
> > or
> > > > > >>>>> OSDs,
> > > > > >>>>> etc.
> > > > > >>>>>
> > > > > >>>>> Message:
> > > > > >>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run
> > > > > >>>>> new to
> > > > > >>>>> create a new cluster...
> > > > > >>>>>
> > > > > >>>>> ...means (if Im not mistaken) that you are runnign ceph-deploy
> > > > > >>>>> from
> > > > > >>>>> NOT original folder...
> > > > > >>>>>
> > > > > >>>>> On 13 March 2015 at 23:03, Georgios Dimitrakakis wrote:
> > > > > >>>>>
> > > > > >>>>>> Not a firewall problem!! Firewall is disabled ...
> > > > > >>>>>>
> > > > > >>>>>> Loic I ve tried mon create because of this:
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > >
> > http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors
> > > > > >>>>>> [4]
> > > > > >>>>>>
> > > > > >>>>>> Should I first create and then add?? What is the proper
> > order???
> > > > > >>>>>> Should I do it from the already existing monitor node or can
> > I
> > > > > >>>>>> run
> > > > > >>>>>> it from the new one?
> > > > > >>>>>>
> > > > > >>>>>> If I try add from the beginning I am getting this:
> > > > > >>>>>>
> > > > > >>>>>> ceph_deploy.conf][DEBUG ] found configuration file at:
> > > > > >>>>>> /home/.cephdeploy.conf
> > > > > >>>>>> [ceph_deploy.cli][INFO  ] Invoked (1.5.22):
> > /usr/bin/ceph-deploy
> > > > > >>>>>> mon add jin
> > > > > >>>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found;
> > run
> > > > > >>>>>> new
> > > > > >>>>>> to create a new cluster
> > > > > >>>>>>
> > > > > >>>>>> Regards,
> > > > > >>>>>>
> > > > > >>>>>> George
> > > > > >>>>>>
> > > > > >>>>>>> Hi,
> > > > > >>>>>>>
> > > > > >>>>>>> I think ceph-deploy mon add (instead of create) is what you
> > > > > >>>>>>> should be using.
> > > > > >>>>>>>
> > > > > >>>>>>> Cheers
> > > > > >>>>>>>
> > > > > >>>>>>> On 13/03/2015 22:25, Georgios Dimitrakakis wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> On an already available cluster I ve tried to add a new
> > > > > >>>>>>>> monitor!
> > > > > >>>>>>>>
> > > > > >>>>>>>> I have used ceph-deploy mon create {NODE}
> > > > > >>>>>>>>
> > > > > >>>>>>>> where {NODE}=the name of the node
> > > > > >>>>>>>>
> > > > > >>>>>>>> and then I restarted the /etc/init.d/ceph service with a
> > > > > >>>>>>>> success at the node
> > > > > >>>>>>>> where it showed that the monitor is running like:
> > > > > >>>>>>>>
> > > > > >>>>>>>> # /etc/init.d/ceph restart
> > > > > >>>>>>>> === mon.jin ===
> > > > > >>>>>>>> === mon.jin ===
> > > > > >>>>>>>> Stopping Ceph mon.jin on jin...kill 36388...done
> > > > > >>>>>>>> === mon.jin ===
> > > > > >>>>>>>> Starting Ceph mon.jin on jin...
> > > > > >>>>>>>> Starting ceph-create-keys on jin...
> > > > > >>>>>>>>
> > > > > >>>>>>>> But checking the quorum it doesnt show the newly added
> > > > > >>>>>>>> monitor!
> > > > > >>>>>>>>
> > > > > >>>>>>>> Plus ceph mon stat gives out only 1 monitor!!!
> > > > > >>>>>>>>
> > > > > >>>>>>>> # ceph mon stat
> > > > > >>>>>>>> e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL LINKS ARE > > > > > >>>>>>>> OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election epoch
> > 1,
> > > > > >>>>>>>> quorum 0 fu
> > > > > >>>>>>>>
> > > > > >>>>>>>> Any ideas on what have I done wrong???
> > > > > >>>>>>>>
> > > > > >>>>>>>> Regards,
> > > > > >>>>>>>>
> > > > > >>>>>>>> George
> > > > > >>>>>>>> _______________________________________________
> > > > > >>>>>>>> ceph-users mailing list
> > > > > >>>>>>>> ceph-users@lists.ceph.com [2]
> > > > > >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
> > > > > >>>>>> _______________________________________________
> > > > > >>>>>> ceph-users mailing list
> > > > > >>>>>> ceph-users@lists.ceph.com [5]
> > > > > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]
> > > > > >>>> _______________________________________________
> > > > > >>>> ceph-users mailing list
> > > > > >>>> ceph-users@lists.ceph.com
> > > > > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >>> _______________________________________________
> > > > > >>> ceph-users mailing list
> > > > > >>> ceph-users@lists.ceph.com
> > > > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >> _______________________________________________
> > > > > >> ceph-users mailing list
> > > > > >> ceph-users@lists.ceph.com
> > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >
> >
> >


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to