Hi,

I recently had a mds outage beucase the mds suicided due to "dne in the mds
map".
I've asked it here before and I know that happens because the monitors took
out this mds from the mds map even though it was alive.

Weird thing, there was no network related issues happening at the time,
which if there was, it would impact many other systems.

I found this in the mon logs, and i'd like to understand it better:
 lease_timeout -- calling new election

full logs:

2017-08-08 23:12:33.286908 7f2b8398d700  1 leveldb: Manual compaction at
level-1 from 'pgmap_pg\x009.a' @ 1830392430 : 1 .. 'paxos\x0057687834' @ 0
: 0; will stop at (end)

2017-08-08 23:12:36.885087 7f2b86f9a700  0
mon.bhs1-mail02-ds03@2(peon).data_health(3524)
update_stats avail 81% total 19555 MB, used 2632 MB, avail 15907 MB
2017-08-08 23:13:29.357625 7f2b86f9a700  1
mon.bhs1-mail02-ds03@2(peon).paxos(paxos
updating c 57687834..57688383) lease_timeout -- calling new election
2017-08-08 23:13:29.358965 7f2b86799700  0 log_channel(cluster) log [INF] :
mon.bhs1-mail02-ds03 calling new monitor election
2017-08-08 23:13:29.359128 7f2b86799700  1
mon.bhs1-mail02-ds03@2(electing).elector(3524)
init, last seen epoch 3524
2017-08-08 23:13:35.383530 7f2b86799700  1 mon.bhs1-mail02-ds03@2(peon).osd
e12617 e12617: 19 osds: 19 up, 19 in
2017-08-08 23:13:35.605839 7f2b86799700  0 mon.bhs1-mail02-ds03@2(peon).mds
e18460 print_map
e18460
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}

Filesystem 'cephfs' (2)
fs_name cephfs
epoch   18460
flags   0
created 2016-08-01 11:07:47.592124
modified        2017-07-03 10:32:44.426431
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    0
last_failure_osd_epoch  12617
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
max_mds 1
in      0
up      {0=1574278}
failed
damaged
stopped
data_pools      8,9
metadata_pool   7
inline_data     disabled
1574278:        10.0.2.4:6800/2556733458 'd' mds.0.18460 up:replay seq 1
laggy since 2017-08-08 23:13:35.174109 (standby for rank 0)



2017-08-08 23:13:35.606303 7f2b86799700  0 log_channel(cluster) log [INF] :
mon.bhs1-mail02-ds03 calling new monitor election
2017-08-08 23:13:35.606361 7f2b86799700  1
mon.bhs1-mail02-ds03@2(electing).elector(3526)
init, last seen epoch 3526
2017-08-08 23:13:36.885540 7f2b86f9a700  0
mon.bhs1-mail02-ds03@2(peon).data_health(3528)
update_stats avail 81% total 19555 MB, used 2636 MB, avail 15903 MB
2017-08-08 23:13:38.311777 7f2b86799700  0 mon.bhs1-mail02-ds03@2(peon).mds
e18461 print_map


Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to