Hi, > > Have you increased the verbosity for the monitors, restarted them, and > looked at the log output?
First of all: The bug is still there, and the logs do not help. But I seem to have found a workaround (just for myself, not generally) As for the bug: I appended "debug log=20" to ceph-deploy's generated ceph.conf but I dont see much in the logs (and they do not get larger bym this option). Here is one of the monitor logs form /var/lib/ceph; the other ones look identically. 2014-04-08 08:26:09.405714 7fd1a0a94780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 28842 2014-04-08 08:26:09.851227 7f66ecd06780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 28943 2014-04-08 08:26:09.933034 7f66ecd06780 0 mon.hvrrzceph2 does not exist in monmap, will attempt to join an existing cluster 2014-04-08 08:26:09.933417 7f66ecd06780 0 using public_addr 10.111.3.2:0/0 -> 10.111.3.2:6789/0 2014-04-08 08:26:09.934003 7f66ecd06780 1 mon.hvrrzceph2@-1(probing) e0 preinit fsid c847e327-1bc5-445f-9c7e-de0551bfde06 2014-04-08 08:26:09.934149 7f66ecd06780 1 mon.hvrrzceph2@-1(probing) e0 initial_members hvrrzceph1,hvrrzceph2,hvrrzceph3, filtering seed monmap 2014-04-08 08:26:09.937302 7f66ecd06780 0 mon.hvrrzceph2@-1(probing) e0 my rank is now 0 (was -1) 2014-04-08 08:26:09.938254 7f66e63c9700 0 -- 10.111.3.2:6789/0 >> 0.0.0.0:0/2 pipe(0x15fba00 sd=21 :0 s=1 pgs=0 cs=0 l=0 c=0x15c9c60).fault 2014-04-08 08:26:09.938442 7f66e61c7700 0 -- 10.111.3.2:6789/0 >> 10.112.3.2:6789/0 pipe(0x1605280 sd=22 :0 s=1 pgs=0 cs=0 l=0 c=0x15c99a0).fault 2014-04-08 08:26:09.939001 7f66ecd04700 0 -- 10.111.3.2:6789/0 >> 0.0.0.0:0/1 pipe(0x15fb280 sd=25 :0 s=1 pgs=0 cs=0 l=0 c=0x15c9420).fault 2014-04-08 08:26:09.939120 7f66e62c8700 0 -- 10.111.3.2:6789/0 >> 10.112.3.1:6789/0 pipe(0x1605780 sd=24 :0 s=1 pgs=0 cs=0 l=0 c=0x15c9b00).fault 2014-04-08 08:26:09.941140 7f66e60c6700 0 -- 10.111.3.2:6789/0 >> 10.112.3.3:6789/0 pipe(0x1605c80 sd=23 :0 s=1 pgs=0 cs=0 l=0 c=0x15c9840).fault 2014-04-08 08:27:09.934720 7f66e7bcc700 0 mon.hvrrzceph2@0(probing).data_health(0) update_stats avail 70% total 15365520 used 3822172 avail 10762804 2014-04-08 08:28:09.935036 7f66e7bcc700 0 mon.hvrrzceph2@0(probing).data_health(0) update_stats avail 70% total 15365520 used 3822172 avail 10762804 Since ceph-deploy complained about not getting an answer from ceph-generate-keys: [hvrrzceph3][DEBUG ] Starting ceph-create-keys on hvrrzceph3... [hvrrzceph3][WARNIN] No data was received after 7 seconds, disconnecting... I therefore tried to create a keys manually: hvrrzceph2:~ # ceph-create-keys --id client.admin admin_socket: exception getting command descriptions: [Errno 2] No such file or directory INFO:ceph-create-keys:ceph-mon admin socket not ready yet. admin_socket: exception getting command descriptions: [Errno 2] No such file or directory INFO:ceph-create-keys:ceph-mon admin socket not ready yet. admin_socket: exception getting command descriptions: [Errno 2] No such file or directory INFO:ceph-create-keys:ceph-mon admin socket not ready yet. admin_socket: exception getting command descriptions: [Errno 2] No such file or directory INFO:ceph-create-keys:ceph-mon admin socket not ready yet. [etc.] As for the workaround: What I wanted to do is: I have three servers, two NICs, and thre IP addresses per server. The NICs are bonded, the bond has an IP address in one network (untagged), and additionally, two tagged VLANs are also on the bond. The bug occured when I tried to use a dedicated cluster network (i.e. one of the tagged vlans) and another dedicated public network (the other tagged vlan). At that time, I had I now tried to leave "cluster network" and "public network" away from ceph.conf ... and now I could create the cluster. So it seems to be a network problem, as you (and Brian) supposed. However, ssh etc. are properly working on all three networks. I don't really understand what's going on there, but at least, I can continue to learn. Thank you. best regards Diedrich -- Diedrich Ehlerding, Fujitsu Technology Solutions GmbH, FTS CE SC PS&IS W, Hildesheimer Str 25, D-30880 Laatzen Fon +49 511 8489-1806, Fax -251806, Mobil +49 173 2464758 Firmenangaben: http://de.ts.fujitsu.com/imprint.html _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
