[ceph-users] MDS stuck at replaying status

Albert Yue Mon, 01 Apr 2019 22:02:07 -0700

Hi,

This happens after we restart the active MDS, and somehow the standby MDS
daemon cannot take over successfully and is stuck at up:replaying. It is
showing the following log. Any idea on how to fix this?


2019-04-02 12:54:00.985079 7f6f70670700  1 mds.WXS0023 respawn
2019-04-02 12:54:00.985095 7f6f70670700  1 mds.WXS0023  e:
'/usr/bin/ceph-mds'
2019-04-02 12:54:00.985097 7f6f70670700  1 mds.WXS0023  0:
'/usr/bin/ceph-mds'
2019-04-02 12:54:00.985099 7f6f70670700  1 mds.WXS0023  1: '-f'
2019-04-02 12:54:00.985100 7f6f70670700  1 mds.WXS0023  2: '--cluster'
2019-04-02 12:54:00.985101 7f6f70670700  1 mds.WXS0023  3: 'ceph'
2019-04-02 12:54:00.985102 7f6f70670700  1 mds.WXS0023  4: '--id'
2019-04-02 12:54:00.985103 7f6f70670700  1 mds.WXS0023  5: 'WXS0023'
2019-04-02 12:54:00.985104 7f6f70670700  1 mds.WXS0023  6: '--setuser'
2019-04-02 12:54:00.985105 7f6f70670700  1 mds.WXS0023  7: 'ceph'
2019-04-02 12:54:00.985106 7f6f70670700  1 mds.WXS0023  8: '--setgroup'
2019-04-02 12:54:00.985107 7f6f70670700  1 mds.WXS0023  9: 'ceph'
2019-04-02 12:54:00.985142 7f6f70670700  1 mds.WXS0023 respawning with exe
/usr/bin/ceph-mds
2019-04-02 12:54:00.985145 7f6f70670700  1 mds.WXS0023  exe_path
/proc/self/exe
2019-04-02 12:54:02.139272 7ff8a739a200  0 ceph version 12.2.5
(cad919881333ac92274171586c827e01f554a70a) luminous (stable), process
(unknown), pid 3369045
2019-04-02 12:54:02.141565 7ff8a739a200  0 pidfile_write: ignore empty
--pid-file
2019-04-02 12:54:06.675604 7ff8a0ecd700  1 mds.WXS0023 handle_mds_map
standby
2019-04-02 12:54:26.114757 7ff8a0ecd700  1 mds.0.136021 handle_mds_map i am
now mds.0.136021
2019-04-02 12:54:26.114764 7ff8a0ecd700  1 mds.0.136021 handle_mds_map
state change up:boot --> up:replay
2019-04-02 12:54:26.114779 7ff8a0ecd700  1 mds.0.136021 replay_start
2019-04-02 12:54:26.114784 7ff8a0ecd700  1 mds.0.136021  recovery set is
2019-04-02 12:54:26.114789 7ff8a0ecd700  1 mds.0.136021  waiting for osdmap
14333 (which blacklists prior instance)
2019-04-02 12:54:26.141256 7ff89a6c0700  0 mds.0.cache creating system
inode with ino:0x100
2019-04-02 12:54:26.141454 7ff89a6c0700  0 mds.0.cache creating system
inode with ino:0x1
2019-04-02 12:54:50.148022 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:50.148049 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:54:52.143637 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:54.148122 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:54.148157 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:54:57.143730 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:58.148239 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:58.148249 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:55:02.143819 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:02.148311 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:02.148330 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:55:06.148393 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:06.148416 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:55:07.143914 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:07.615602 7ff89e6c8700  1 heartbeat_map reset_timeout
'MDSRank' had timed out after 15
2019-04-02 12:55:07.618294 7ff8a0ecd700  1 mds.WXS0023 map removed me
(mds.-1 gid:7441294) from cluster due to lost contact; respawning
2019-04-02 12:55:07.618296 7ff8a0ecd700  1 mds.WXS0023 respawn
2019-04-02 12:55:07.618314 7ff8a0ecd700  1 mds.WXS0023  e:
'/usr/bin/ceph-mds'
2019-04-02 12:55:07.618318 7ff8a0ecd700  1 mds.WXS0023  0:
'/usr/bin/ceph-mds'
2019-04-02 12:55:07.618319 7ff8a0ecd700  1 mds.WXS0023  1: '-f'
2019-04-02 12:55:07.618320 7ff8a0ecd700  1 mds.WXS0023  2: '--cluster'
2019-04-02 12:55:07.618320 7ff8a0ecd700  1 mds.WXS0023  3: 'ceph'
2019-04-02 12:55:07.618321 7ff8a0ecd700  1 mds.WXS0023  4: '--id'
2019-04-02 12:55:07.618321 7ff8a0ecd700  1 mds.WXS0023  5: 'WXS0023'
2019-04-02 12:55:07.618322 7ff8a0ecd700  1 mds.WXS0023  6: '--setuser'
2019-04-02 12:55:07.618323 7ff8a0ecd700  1 mds.WXS0023  7: 'ceph'
2019-04-02 12:55:07.618323 7ff8a0ecd700  1 mds.WXS0023  8: '--setgroup'
2019-04-02 12:55:07.618325 7ff8a0ecd700  1 mds.WXS0023  9: 'ceph'
2019-04-02 12:55:07.618352 7ff8a0ecd700  1 mds.WXS0023 respawning with exe
/usr/bin/ceph-mds
2019-04-02 12:55:07.618353 7ff8a0ecd700  1 mds.WXS0023  exe_path
/proc/self/exe
2019-04-02 12:55:09.174064 7f4c596be200  0 ceph version 12.2.5
(cad919881333ac92274171586c827e01f554a70a) luminous (stable), process
(unknown), pid 3369045
2019-04-02 12:55:09.176292 7f4c596be200  0 pidfile_write: ignore empty
--pid-file
2019-04-02 12:55:13.296469 7f4c531f1700  1 mds.WXS0023 handle_mds_map
standby


Thanks!

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] MDS stuck at replaying status

Reply via email to