Reproduce with 'debug mds = 20' and 'debug ms = 20'. shinobu
On Mon, Jul 4, 2016 at 9:42 PM, Lihang <[email protected]> wrote: > Thank you very much for your advice. The command "ceph mds repaired 0" > work fine in my cluster, my cluster state become HEALTH_OK and the cephfs > state become normal also. but in the monitor or mds log file ,it just > record the replay and recover process log without point out somewhere is > abnormal . and I haven't the log when this issue happened . So I haven't > found out the root cause of this issue. I'll try to reproduce this issue . > thank you very much again! > fisher > > -----邮件原件----- > 发件人: John Spray [mailto:[email protected]] > 发送时间: 2016年7月4日 17:49 > 收件人: lihang 12398 (RD) > 抄送: [email protected] > 主题: Re: [ceph-users] 转发: how to fix the mds damaged issue > > On Sun, Jul 3, 2016 at 8:06 AM, Lihang <[email protected]> wrote: > > root@BoreNode2:~# ceph -v > > > > ceph version 10.2.0 > > > > > > > > 发件人: lihang 12398 (RD) > > 发送时间: 2016年7月3日 14:47 > > 收件人: [email protected] > > 抄送: Ceph Development; '[email protected]'; zhengbin 08747 (RD); > > xusangdi > > 11976 (RD) > > 主题: how to fix the mds damaged issue > > > > > > > > Hi, my ceph cluster mds is damaged and the cluster is degraded after > > our machines library power down suddenly. then the cluster is > > “HEALTH_ERR” and cann’t be recovered to health by itself after my > > > > Reboot the storage node system or restart the ceph cluster yet. After > > that I also use the following command to remove the damaged mds, but > > the damaged mds be removed failed and the issue exist still. The > > another two mds state is standby. Who can tell me how to fix this > > issue and find out what happened in my cluter? > > > > the remove damaged mds process in my storage node as follows. > > > > 1> Execute ”stop ceph-mds-all” command in the damaged mds node > > > > 2> ceph mds rmfailed 0 --yes-i-really-mean-it > > rmfailed is not something you want to use in these circumstances. > > > 3> root@BoreNode2:~# ceph mds rm 0 > > > > mds gid 0 dne > > > > > > > > The detailed status of my cluster as following: > > > > root@BoreNode2:~# ceph -s > > > > cluster 98edd275-5df7-414f-a202-c3d4570f251c > > > > health HEALTH_ERR > > > > mds rank 0 is damaged > > > > mds cluster is degraded > > > > monmap e1: 3 mons at > > {BoreNode2=172.16.65.141:6789/0,BoreNode3=172.16.65.142:6789/0,BoreNod > > e4=172.16.65.143:6789/0} > > > > election epoch 1010, quorum 0,1,2 > > BoreNode2,BoreNode3,BoreNode4 > > > > fsmap e168: 0/1/1 up, 3 up:standby, 1 damaged > > > > osdmap e338: 8 osds: 8 up, 8 in > > > > flags sortbitwise > > > > pgmap v17073: 1560 pgs, 5 pools, 218 kB data, 32 objects > > > > 423 MB used, 3018 GB / 3018 GB avail > > > > 1560 active+clean > > When an MDS rank is marked as damaged, that means something invalid was > found when reading from the pool storing metadata objects. The next step > is to find out what that was. Look in the MDS log and in ceph.log from the > time when it went damaged, to find the most specific error message you can. > > If you do not have the logs and want to have the MDS try operating again > (to reproduce whatever condition caused it to be marked damaged), you can > enable it by using "ceph mds repaired 0", then start the daemon and see how > it is failing. > > John > > > root@BoreNode2:~# ceph mds dump > > > > dumped fsmap epoch 168 > > > > fs_name TudouFS > > > > epoch 156 > > > > flags 0 > > > > created 2016-04-02 02:48:11.150539 > > > > modified 2016-04-03 03:04:57.347064 > > > > tableserver 0 > > > > root 0 > > > > session_timeout 60 > > > > session_autoclose 300 > > > > max_file_size 1099511627776 > > > > last_failure 0 > > > > last_failure_osd_epoch 83 > > > > compat compat={},rocompat={},incompat={1=base v0.20,2=client > > writeable ranges,3=default file layouts on dirs,4=dir inode in > > separate object,5=mds uses versioned encoding,6=dirfrag is stored in > > omap,8=file layout v2} > > > > max_mds 1 > > > > in 0 > > > > up {} > > > > failed > > > > damaged 0 > > > > stopped > > > > data_pools 4 > > > > metadata_pool 3 > > > > inline_data disabled > > > > ---------------------------------------------------------------------- > > --------------------------------------------------------------- > > 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 > > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 > > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 > > 邮件! > > This e-mail and its attachments contain confidential information from > > H3C, which is intended only for the person or entity whose address is > > listed above. Any use of the information contained herein in any way > > (including, but not limited to, total or partial disclosure, > > reproduction, or dissemination) by persons other than the intended > > recipient(s) is prohibited. If you receive this e-mail in error, > > please notify the sender by phone or email immediately and delete it! > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Email: [email protected] [email protected]
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
