On Tue, Aug 6, 2013 at 12:28 AM, Sage Weil <[email protected]> wrote:
> On Mon, 5 Aug 2013, Yu Changyuan wrote:
> > The good news is, with new patch, ceph start OK, cephfs mount OK, and kvm
> > virtual machine use rbd boot OK(and seems running ok), and I check the
> > timestamp of last file write to cephfs, it's fair near to the time of
> > reboot(which cause ceph not work any more). Since I don't have any other
> way
> > to check the integrity of the files store in cephfs, I just randomly
> pick
> > some video files, and play it, all seems OK.
> >
> > So, thank you very much.
> >
> > But, I do not use the last version of files in /var/lib/ceph/mon/ceph-a,
> > with these files, ceph-mon startup ok, and ceph -s returns, but osd still
> > think the monitor is wrong node and refuse to work.
> > Then I think I may try the files of 2 day ago(Aug 1st) and see what
> happen,
> > and something actually happen, that is ceph-osd start to work.
> > So, I am a bit curious about why patched version work with the ceph-mon
> data
> > 2 days ago but original version not,
> > and what more important, do I need extra step to make current running
> ceph
> > cluster to work with a normal version(not patched) ceph,
> > and are there any chance that current cluster will run into problem in
> the
> > future(keep current state and do not take any extra step).
>
> I think you will be fine with the current state and switching back to
> normal release code.
>
That is to say, I can just stop current running ceph-{osd,mds,mon}, and
then start normal release one(0.61.7)?
> I'm confused why ceph-osds wouldn't start with the latest mon data, but
> can't speculate too much without spending time analyzing your logs from
> the failed startup.
>
I just clear logs before try old mon data(I do not predict the old mon data
will work), and after osd starting ok,
the status of osd are changed, so perhaps I can not provide enough log for
such an analysis.
And it maybe not worth to cost time to analyze the reason. After all, ceph
back online again.
>
> Glad to hear you're back online!
Thank you.
sage
>
>
> >
> >
> >
> > On Mon, Aug 5, 2013 at 12:39 AM, Sage Weil <[email protected]> wrote:
> > On Sun, 4 Aug 2013, Yu Changyuan wrote:
> > > And here is the log of ceph-mon, with debug_mon set to 10, I run
> > "ceph -s"
> > > command(which is blocked) on 192.168.1.2 during recording this log.
> > >
> > > https://gist.github.com/yuchangyuan/ba3e72452215221d1e82
> >
> > I pushed one more patch to that branch that should get you up. This
> > one
> > should go to master as well.
> >
> > sage
> >
> > >
> > >
> > > On Sun, Aug 4, 2013 at 3:25 PM, Yu Changyuan <[email protected]>
> > wrote:
> > > I just try the branch, and mon start ok, here is the log:
> > > https://gist.github.com/yuchangyuan/3138952ac60508d18aed
> > > But ceph -s or ceph -w just block, without any message
> > return(I
> > > just start monitor, no mds or osd).
> > >
> > >
> > >
> > > On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <[email protected]>
> > > wrote:
> > >
> > > On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil
> > > <[email protected]> wrote:
> > > It looks like the auth state wasn't trimmed
> > > properly. It also sort of
> > > looks like you aren't using authentication on
> > > this cluster... is that
> > > true? (The keyring file was empty.)
> > >
> > > Yes, your're right, I disable auth. It's just a personal
> > > cluster, so the simpler the better.
> > >
> > > This looks like a trim issue, but I don't remember
> > > what all we fixed since
> > > .1.. that was a while ago! We certainly haven't
> > > seen anything like this
> > > recently.
> > >
> > > I pushed a branch wip-mon-skip-auth-cuttlefish that
> > > skips the missing
> > > incrementals and will get your mon up, but you may
> > > lose some auth keys.
> > > If auth is on, you'll need ot add them back again.
> > > If not, it may just
> > > work with this.
> > >
> > > You can grab the packages from
> > >
> > >
> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-
> >
> > > auth-cuttlefish
> > >
> > > or whatever the right dir is for your distro when
> > > they appear in about 15
> > > minutes. Let me know if that resolves it.
> > >
> > >
> > > Thank you for your work, I will try as soon as possible.
> > > PS: My distro is Gentoo, so maybe I should build from source
> > > directly.
> > >
> > >
> > > sage
> > >
> > >
> > > On Sun, 4 Aug 2013, Yu Changyuan wrote:
> > >
> > > >
> > > >
> > > >
> > > > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil
> > > <[email protected]> wrote:
> > > > On Sat, 3 Aug 2013, Yu Changyuan wrote:
> > > > > I run a tiny ceph cluster with only one
> > > monitor. After a
> > > > reboot the system,
> > > > > the monitor refuse to start.
> > > > > I try to start ceph-mon manually with
> > > command 'ceph -f -i a',
> > > > below is
> > > > > first few lines of the output:
> > > > >
> > > > > starting mon.a rank 0 at
> > > 192.168.1.10:6789/0 mon_data
> > > > > /var/lib/ceph/mon/ceph-a fsid
> > > > 554bee60-9602-4017-a6e1-ceb6907a218c
> > > > > mon/AuthMonitor.cc: In function 'virtual
> > > void
> > > > > AuthMonitor::update_from_paxos()' thread
> > > 7f9e3b0db780 time
> > > > 2013-08-03
> > > > > 20:27:29.208156
> > > > > mon/AuthMonitor.cc: 147: FAILED assert(ret
> > > == 0)
> > > > >
> > > > > The full log is at:
> > > >
> > > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
> > > >
> > > > This is 0.61.1. Can you try again with 0.61.7 to
> > > rule out anything
> > > > there?
> > > >
> > > >
> > > > I just tried 0.61.7, still out of luck. Here is
> > > the log:
> > > >
> > > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243
> > > >
> > > >
> > > > > So, are there any way to make the monitor
> > > work again?
> > > > >
> > > > > I have a backup of
> > > /var/lib/ceph/mon/ceph-a in 2013-08-01,
> > > > and success
> > > > > start the monitor with these files,
> > > > > but rados and other command not work
> > > because osd keep saying
> > > > the monitor is
> > > > > the wrong node(that's right, it's actually
> > > the node 2 days
> > > > ago).
> > > >
> > > > In general that is not going to work well as the
> > > cluster does not like
> > > > to
> > > > warp back in time. If it does not start with .7
> > > (I suspect it won't),
> > > > can
> > > > you send us a tarball of the mon data directory so
> > > we can see what is
> > > > awry?
> > > >
> > > >
> > > > OK, I will send the tarball of
> > > /var/lib/ceph/mon/ceph-a to you directly.
> > > >
> > > >
> > > > sage
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Changyuan
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Changyuan
> > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Changyuan
> > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Changyuan
> > >
> > >
> >
> >
> >
> >
> > --
> > Best regards,
> > Changyuan
> >
> >
>
--
Best regards,
Changyuan
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com