Re: [ceph-users] Ceph stops responding

Georgios Dimitrakakis Wed, 05 Mar 2014 02:22:36 -0800

My setup consists of two nodes.

The first node (master) is running:


-mds
-mon
-osd.0



and the second node (CLIENT) is running:

-osd.1


Therefore I 've restarted ceph services on both nodes

Leaving the "ceph -w" running for as long as it can after a few secondsthe error that is produced is this:

2014-03-05 12:08:17.715699 7fba13fff700 0 monclient: hunting for newmon2014-03-05 12:08:17.716108 7fba102f8700 0 -- 192.168.0.10:0/1008298 >>X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1c=0x7fba080090b0).fault



(where X.Y.Z.X is the public IP of the CLIENT node).

And it keep goes on...

"ceph-health" after a few minutes shows the following

2014-03-05 12:12:58.355677 7effc52fb700 0 monclient(hunting):authenticate timed out after 3002014-03-05 12:12:58.355717 7effc52fb700 0 librados: client.adminauthentication error (110) Connection timed out

Error connecting to cluster: TimedOut


Any ideas now??

Best,

G.

On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote:

First try to start OSD nodes by restarting the ceph service on ceph
nodes. If it works file then you could able to see ceph-osd process
running in process list. And do not need to add any public or private
network in ceph.conf. If none of the OSDs run then you need to
reconfigure them from monitor node.

Please check ceph-mon process is running on monitor node or not?
ceph-mds should not run.

also check /etc/hosts file with valid ip address of cluster nodes

Finally check ceph.client.admin.keyring andceph.bootstrap-osd.keyring

should be matched in all the cluster nodes.

Best of luck.
Srinivas.

On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:

Hi!

I have installed ceph and created two osds and was very happy with
that but apparently not everything was correct.

Today after a system reboot the cluster comes up and for a few
moments it seems that its ok (using the "ceph health" command) but
after a few seconds the "ceph health" command doesnt produce any
output at all.

It justs stays there without anything on the screen...

ceph -w is doing the same as well...

If I restart the ceph services ("service ceph restart") again for a
few seconds is working but after a few more it stays frozen.

Initially I thought that this was a firewall problem but apparently
it isnt.

Then I though that this had to do with the

public_network

cluster_network

not defined in ceph.conf and changed that.

No matter whatever I do the cluster works for a few seconds after
the service restart and then it stops responding...

Any help much appreciated!!!

Best,

G.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]




Links:
------
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:gior...@acmac.uoc.gr

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph stops responding

Reply via email to