[ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

Vincent Godin Wed, 16 Nov 2016 10:09:47 -0800

Hello,

We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial
was hammer 0.94.5) but we have still some big problems on our production
environment :


   - some ceph filesystem are not mounted at startup and we have to mount
   them with the "/bin/sh -c 'flock /var/lock/ceph-disk /usr/sbin/ceph-disk
   --verbose --log-stdout trigger --syn /dev/vdX1'"

   - some OSD start but are in timeout as soon as they start for a pretty
   long time (more than 5 mn)
      - 016-11-15 01:46:26.625945 7f79db91e800  0 osd.32 191438 done with
      init, starting boot process
      2016-11-15 01:47:28.344996 7f79d61f7700  1 heartbeat_map is_healthy
      'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
      2016-11-15 01:47:33.345098 7f79d61f7700  1 heartbeat_map is_healthy
      'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
      ...

      - these OSD take very long time to stop


   - we just loosed one OSD and the cluster is unable to stabilize and some
   OSDs go Up and Down. The cluster is in ERR state and can not serve
   production environment


   - we are in jewel 10.2.2 on CentOS 7.2 kernel 3.10.0-327.36.3.el7.x86_64

Help will be apreciate !

Vincent

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

Reply via email to