[ceph-users] CephFS msg length greater than osd_max_write_size

Ryan Leimenstoll Mon, 20 May 2019 15:11:43 -0700

Hi all, 

We recently encountered an issue where our CephFS filesystem unexpectedly was 
set to read-only. When we look at some of the logs from the daemons I can see 
the following:


On the MDS:
...
2019-05-18 16:34:24.341 7fb3bd610700 -1 mds.0.89098 unhandled write error (90) 
Message too long, force readonly...
2019-05-18 16:34:24.341 7fb3bd610700  1 mds.0.cache force file system read-only
2019-05-18 16:34:24.341 7fb3bd610700  0 log_channel(cluster) log [WRN] : force 
file system read-only
2019-05-18 16:34:41.289 7fb3c0616700  1 heartbeat_map is_healthy 'MDSRank' had 
timed out after 15
2019-05-18 16:34:41.289 7fb3c0616700  0 mds.beacon.objmds00 Skipping beacon 
heartbeat to monitors (last acked 4.00101s ago); MDS internal heartbeat is not 
healthy!
...

On one of the OSDs it was most likely targeting:
...
2019-05-18 16:34:24.140 7f8134e6c700 -1 osd.602 pg_epoch: 682796 pg[49.20b( v 
682796'15706523 (682693'15703449,682796'15706523] local-lis/les=673041/673042 
n=10524 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 
673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682796'15706523 lcod 
682796'15706522 mlcod 682796'15706522 active+clean] do_op msg data len 95146005 
> osd_max_write_size 94371840 on osd_op(mds.0.89098:48609421 49.20b 
49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals] snapc 0=[] 
ondisk+write+known_if_redirected+full_force e682796) v8
2019-05-18 17:10:33.695 7f813466b700  0 log_channel(cluster) log [DBG] : 49.31c 
scrub starts
2019-05-18 17:10:34.980 7f813466b700  0 log_channel(cluster) log [DBG] : 49.31c 
scrub ok
2019-05-18 22:17:37.320 7f8134e6c700 -1 osd.602 pg_epoch: 683434 pg[49.20b( v 
682861'15706526 (682693'15703449,682861'15706526] local-lis/les=673041/673042 
n=10525 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 
673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682861'15706526 lcod 
682859'15706525 mlcod 682859'15706525 active+clean] do_op msg data len 95903764 
> osd_max_write_size 94371840 on osd_op(mds.0.91565:357877 49.20b 
49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals,omap-rm-keys] 
snapc 0=[] ondisk+write+known_if_redirected+full_force e683434) v8
…

During this time there were some health concerns with the cluster. 
Significantly, since the error above seems to be related to the SessionMap, we 
had a client that had a few blocked requests for over 35948 secs (it’s a member 
of a compute cluster so we let the node drain/finish jobs before rebooting). We 
have also had some issues with certain OSDs running older hardware staying 
up/responding timely to heartbeats after upgrading to Nautilus, although that 
seems to be an iowait/load issue that we are actively working to mitigate 
separately.

We are running Nautilus 14.2.1 on RHEL7.6. There is only one MDS Rank, with an 
active/standby setup between two MDS nodes. MDS clients are mounted using the 
RHEL7.6 kernel driver. 

My read here would be that the MDS is sending too large a message to the OSD, 
however my understanding was that the MDS should be using osd_max_write_size to 
determine the size of that message [0]. Is this maybe a bug in how this is 
calculated on the MDS side?


Thanks!
Ryan Leimenstoll
[email protected]
University of Maryland Institute for Advanced Computer Studies



[0] https://www.spinics.net/lists/ceph-devel/msg11951.html
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS msg length greater than osd_max_write_size

Reply via email to