hello All, I am trying Ceph - Jewel on Ubuntu 16.04 with Kubernetes 1.6.2 and Docker 1.11.2 but for some unknown reason its not coming up and crashing often,all ceph commands are failing. from *ceph-mon-check:*
kubectl logs -n ceph ceph-mon-check-3190136794-21xg4 -f subprocess.CalledProcessError: Command 'ceph --cluster=${CLUSTER} mon getmap > /tmp/monmap && monmaptool -f /tmp/monmap --print' returned non-zero exit status 1 2017-05-01 15:45:52 /entrypoint.sh: sleep 30 sec 2017-05-01 15:46:22 /entrypoint.sh: checking for zombie mons 2017-05-01 15:51:22.613476 7f0d3ea8c700 0 monclient(hunting): authenticate timed out after 300 2017-05-01 15:51:22.613561 7f0d3ea8c700 0 librados: client.admin authentication error (110) Connection timed out Error connecting to cluster: TimedOut Traceback (most recent call last): File "/check_zombie_mons.py", line 30, in <module> current_mons = extract_mons_from_monmap() File "/check_zombie_mons.py", line 18, in extract_mons_from_monmap monmap = subprocess.check_output(monmap_command, shell=True) File "/usr/lib/python2.7/subprocess.py", line 574, in check_output raise CalledProcessError(retcode, cmd, output=output) all pods and nodes are able to resolve service-name "ceph-mon" *cep keys are present in all pods.* kubectl exec -n ceph ceph-mon-0 -- ls /etc/ceph/ ceph.client.admin.keyring ceph.conf ceph.mon.keyring *kubectl logs -n ceph ceph-mon-0 --tail=20* 2017-05-01 16:08:44.081462 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:45.158398 7fcdf1595700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d60fb0000 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d603f1980).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:45.158328 7fcdf0f8f700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d6026b400 sd=19 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d602eac00).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:45.745314 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:46.081824 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:47.745473 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:48.081962 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:49.745526 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:50.081979 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:51.746027 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:52.082151 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:53.745586 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:54.082630 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:55.158549 7fcdf0b8b700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d6026b400 sd=19 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d608ff900).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:55.158621 7fcdf1191700 0 -- 192.168.110.236:6789/0 >> :/0 pipe(0x562d60fb0000 sd=21 :6789 s=0 pgs=0 cs=0 l=0 c=0x562d608fd500).accept failed to getpeername (107) Transport endpoint is not connected 2017-05-01 16:08:55.745867 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:56.082868 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints 2017-05-01 16:08:57.686779 7fcdf3e9b700 0 mon.ceph-mon-0@-1(probing).data_health(0) update_stats avail 93% total 237 GB, used 4398 MB, avail 221 GB 2017-05-01 16:08:57.746175 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.198.94:6789/0 to list of hints 2017-05-01 16:08:58.083616 7fcdf369a700 1 mon.ceph-mon-0@-1(probing) e0 adding peer 192.168.14.239:6789/0 to list of hints kubectl get po -n ceph NAME READY STATUS RESTARTS AGE ceph-mds-722237312-35l5k 0/1 CrashLoopBackOff 324 1d ceph-mon-0 1/1 Running 0 1d ceph-mon-1 1/1 Running 0 1d ceph-mon-2 1/1 Running 0 1d ceph-mon-check-3190136794-21xg4 1/1 Running 0 1d ceph-osd-bvz3h 0/1 CrashLoopBackOff 409 1d ceph-osd-hq50d 0/1 Running 408 1d ceph-osd-ljdwh 0/1 CrashLoopBackOff 409 1d kubectl logs -n ceph ceph-osd-ljdwh --tail=20 2017-05-01 16:33:57 /entrypoint.sh: k8s: config is stored as k8s secrets. 2017-05-01 16:33:57 /entrypoint.sh: k8s: does not generate the admin key. Use Kubernetes secrets instead. 2017-05-01 16:33:57 /entrypoint.sh: Creating osd with ceph --cluster ceph osd create ceph.conf kubectl exec -n ceph ceph-osd-ljdwh -- cat /etc/ceph/ceph.conf |more [global] fsid = 34fc2470-a9f2-49df-8d2f-701e2679c8c5 cephx = true cephx_require_signatures = false cephx_cluster_require_signatures = true cephx_service_require_signatures = false # auth max_open_files = 131072 osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 3 osd_pool_default_min_size = 1 mon_osd_full_ratio = .95 mon_osd_nearfull_ratio = .85 mon_host = ceph-mon [mon] mon_osd_down_out_interval = 600 mon_osd_min_down_reporters = 4 mon_clock_drift_allowed = .15 mon_clock_drift_warn_backoff = 30 Any idea why its failing with authentication error. Regards, Kev
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com