Hi, This happens after we restart the active MDS, and somehow the standby MDS daemon cannot take over successfully and is stuck at up:replaying. It is showing the following log. Any idea on how to fix this?
2019-04-02 12:54:00.985079 7f6f70670700 1 mds.WXS0023 respawn 2019-04-02 12:54:00.985095 7f6f70670700 1 mds.WXS0023 e: '/usr/bin/ceph-mds' 2019-04-02 12:54:00.985097 7f6f70670700 1 mds.WXS0023 0: '/usr/bin/ceph-mds' 2019-04-02 12:54:00.985099 7f6f70670700 1 mds.WXS0023 1: '-f' 2019-04-02 12:54:00.985100 7f6f70670700 1 mds.WXS0023 2: '--cluster' 2019-04-02 12:54:00.985101 7f6f70670700 1 mds.WXS0023 3: 'ceph' 2019-04-02 12:54:00.985102 7f6f70670700 1 mds.WXS0023 4: '--id' 2019-04-02 12:54:00.985103 7f6f70670700 1 mds.WXS0023 5: 'WXS0023' 2019-04-02 12:54:00.985104 7f6f70670700 1 mds.WXS0023 6: '--setuser' 2019-04-02 12:54:00.985105 7f6f70670700 1 mds.WXS0023 7: 'ceph' 2019-04-02 12:54:00.985106 7f6f70670700 1 mds.WXS0023 8: '--setgroup' 2019-04-02 12:54:00.985107 7f6f70670700 1 mds.WXS0023 9: 'ceph' 2019-04-02 12:54:00.985142 7f6f70670700 1 mds.WXS0023 respawning with exe /usr/bin/ceph-mds 2019-04-02 12:54:00.985145 7f6f70670700 1 mds.WXS0023 exe_path /proc/self/exe 2019-04-02 12:54:02.139272 7ff8a739a200 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 3369045 2019-04-02 12:54:02.141565 7ff8a739a200 0 pidfile_write: ignore empty --pid-file 2019-04-02 12:54:06.675604 7ff8a0ecd700 1 mds.WXS0023 handle_mds_map standby 2019-04-02 12:54:26.114757 7ff8a0ecd700 1 mds.0.136021 handle_mds_map i am now mds.0.136021 2019-04-02 12:54:26.114764 7ff8a0ecd700 1 mds.0.136021 handle_mds_map state change up:boot --> up:replay 2019-04-02 12:54:26.114779 7ff8a0ecd700 1 mds.0.136021 replay_start 2019-04-02 12:54:26.114784 7ff8a0ecd700 1 mds.0.136021 recovery set is 2019-04-02 12:54:26.114789 7ff8a0ecd700 1 mds.0.136021 waiting for osdmap 14333 (which blacklists prior instance) 2019-04-02 12:54:26.141256 7ff89a6c0700 0 mds.0.cache creating system inode with ino:0x100 2019-04-02 12:54:26.141454 7ff89a6c0700 0 mds.0.cache creating system inode with ino:0x1 2019-04-02 12:54:50.148022 7ff89dec7700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:54:50.148049 7ff89dec7700 1 mds.beacon.WXS0023 _send skipping beacon, heartbeat map not healthy 2019-04-02 12:54:52.143637 7ff8a1ecf700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:54:54.148122 7ff89dec7700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:54:54.148157 7ff89dec7700 1 mds.beacon.WXS0023 _send skipping beacon, heartbeat map not healthy 2019-04-02 12:54:57.143730 7ff8a1ecf700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:54:58.148239 7ff89dec7700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:54:58.148249 7ff89dec7700 1 mds.beacon.WXS0023 _send skipping beacon, heartbeat map not healthy 2019-04-02 12:55:02.143819 7ff8a1ecf700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:55:02.148311 7ff89dec7700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:55:02.148330 7ff89dec7700 1 mds.beacon.WXS0023 _send skipping beacon, heartbeat map not healthy 2019-04-02 12:55:06.148393 7ff89dec7700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:55:06.148416 7ff89dec7700 1 mds.beacon.WXS0023 _send skipping beacon, heartbeat map not healthy 2019-04-02 12:55:07.143914 7ff8a1ecf700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-04-02 12:55:07.615602 7ff89e6c8700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15 2019-04-02 12:55:07.618294 7ff8a0ecd700 1 mds.WXS0023 map removed me (mds.-1 gid:7441294) from cluster due to lost contact; respawning 2019-04-02 12:55:07.618296 7ff8a0ecd700 1 mds.WXS0023 respawn 2019-04-02 12:55:07.618314 7ff8a0ecd700 1 mds.WXS0023 e: '/usr/bin/ceph-mds' 2019-04-02 12:55:07.618318 7ff8a0ecd700 1 mds.WXS0023 0: '/usr/bin/ceph-mds' 2019-04-02 12:55:07.618319 7ff8a0ecd700 1 mds.WXS0023 1: '-f' 2019-04-02 12:55:07.618320 7ff8a0ecd700 1 mds.WXS0023 2: '--cluster' 2019-04-02 12:55:07.618320 7ff8a0ecd700 1 mds.WXS0023 3: 'ceph' 2019-04-02 12:55:07.618321 7ff8a0ecd700 1 mds.WXS0023 4: '--id' 2019-04-02 12:55:07.618321 7ff8a0ecd700 1 mds.WXS0023 5: 'WXS0023' 2019-04-02 12:55:07.618322 7ff8a0ecd700 1 mds.WXS0023 6: '--setuser' 2019-04-02 12:55:07.618323 7ff8a0ecd700 1 mds.WXS0023 7: 'ceph' 2019-04-02 12:55:07.618323 7ff8a0ecd700 1 mds.WXS0023 8: '--setgroup' 2019-04-02 12:55:07.618325 7ff8a0ecd700 1 mds.WXS0023 9: 'ceph' 2019-04-02 12:55:07.618352 7ff8a0ecd700 1 mds.WXS0023 respawning with exe /usr/bin/ceph-mds 2019-04-02 12:55:07.618353 7ff8a0ecd700 1 mds.WXS0023 exe_path /proc/self/exe 2019-04-02 12:55:09.174064 7f4c596be200 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 3369045 2019-04-02 12:55:09.176292 7f4c596be200 0 pidfile_write: ignore empty --pid-file 2019-04-02 12:55:13.296469 7f4c531f1700 1 mds.WXS0023 handle_mds_map standby Thanks!
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
