Reproduce with 'debug mds = 20' and 'debug ms = 20'.

 shinobu

On Mon, Jul 4, 2016 at 9:42 PM, Lihang <[email protected]> wrote:

> Thank you very much for your advice. The command "ceph mds repaired 0"
> work fine in my cluster, my cluster state become HEALTH_OK and the cephfs
> state become normal also. but in the monitor or mds log file ,it just
> record the replay and recover process log without point out somewhere is
> abnormal . and I haven't the log when this issue happened . So I haven't
> found out the root cause of this issue. I'll try to reproduce this issue .
> thank you very much again!
> fisher
>
> -----邮件原件-----
> 发件人: John Spray [mailto:[email protected]]
> 发送时间: 2016年7月4日 17:49
> 收件人: lihang 12398 (RD)
> 抄送: [email protected]
> 主题: Re: [ceph-users] 转发: how to fix the mds damaged issue
>
> On Sun, Jul 3, 2016 at 8:06 AM, Lihang <[email protected]> wrote:
> > root@BoreNode2:~# ceph -v
> >
> > ceph version 10.2.0
> >
> >
> >
> > 发件人: lihang 12398 (RD)
> > 发送时间: 2016年7月3日 14:47
> > 收件人: [email protected]
> > 抄送: Ceph Development; '[email protected]'; zhengbin 08747 (RD);
> > xusangdi
> > 11976 (RD)
> > 主题: how to fix the mds damaged issue
> >
> >
> >
> > Hi, my ceph cluster mds is damaged and the cluster is degraded after
> > our machines library power down suddenly. then the cluster is
> > “HEALTH_ERR” and cann’t be recovered to health by itself after my
> >
> > Reboot the storage node system or restart the ceph cluster yet. After
> > that I also use the following command to remove the damaged mds, but
> > the damaged mds be removed failed and the issue exist still. The
> > another two mds state is standby. Who can tell me how to fix this
> > issue and find out what happened in my cluter?
> >
> > the remove damaged mds process in my storage node as follows.
> >
> > 1>     Execute ”stop ceph-mds-all” command  in the damaged mds node
> >
> > 2>  ceph mds rmfailed 0 --yes-i-really-mean-it
>
> rmfailed is not something you want to use in these circumstances.
>
> > 3>  root@BoreNode2:~# ceph  mds rm 0
> >
> > mds gid 0 dne
> >
> >
> >
> > The detailed status of my cluster as following:
> >
> > root@BoreNode2:~# ceph -s
> >
> >   cluster 98edd275-5df7-414f-a202-c3d4570f251c
> >
> >      health HEALTH_ERR
> >
> >             mds rank 0 is damaged
> >
> >             mds cluster is degraded
> >
> >      monmap e1: 3 mons at
> > {BoreNode2=172.16.65.141:6789/0,BoreNode3=172.16.65.142:6789/0,BoreNod
> > e4=172.16.65.143:6789/0}
> >
> >             election epoch 1010, quorum 0,1,2
> > BoreNode2,BoreNode3,BoreNode4
> >
> >       fsmap e168: 0/1/1 up, 3 up:standby, 1 damaged
> >
> >      osdmap e338: 8 osds: 8 up, 8 in
> >
> >             flags sortbitwise
> >
> >       pgmap v17073: 1560 pgs, 5 pools, 218 kB data, 32 objects
> >
> >             423 MB used, 3018 GB / 3018 GB avail
> >
> >                 1560 active+clean
>
> When an MDS rank is marked as damaged, that means something invalid was
> found when reading from the pool storing metadata objects.  The next step
> is to find out what that was.  Look in the MDS log and in ceph.log from the
> time when it went damaged, to find the most specific error message you can.
>
> If you do not have the logs and want to have the MDS try operating again
> (to reproduce whatever condition caused it to be marked damaged), you can
> enable it by using "ceph mds repaired 0", then start the daemon and see how
> it is failing.
>
> John
>
> > root@BoreNode2:~# ceph mds dump
> >
> > dumped fsmap epoch 168
> >
> > fs_name TudouFS
> >
> > epoch   156
> >
> > flags   0
> >
> > created 2016-04-02 02:48:11.150539
> >
> > modified        2016-04-03 03:04:57.347064
> >
> > tableserver     0
> >
> > root    0
> >
> > session_timeout 60
> >
> > session_autoclose       300
> >
> > max_file_size   1099511627776
> >
> > last_failure    0
> >
> > last_failure_osd_epoch  83
> >
> > compat  compat={},rocompat={},incompat={1=base v0.20,2=client
> > writeable ranges,3=default file layouts on dirs,4=dir inode in
> > separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> > omap,8=file layout v2}
> >
> > max_mds 1
> >
> > in      0
> >
> > up      {}
> >
> > failed
> >
> > damaged 0
> >
> > stopped
> >
> > data_pools      4
> >
> > metadata_pool   3
> >
> > inline_data     disabled
> >
> > ----------------------------------------------------------------------
> > ---------------------------------------------------------------
> > 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> > 邮件!
> > This e-mail and its attachments contain confidential information from
> > H3C, which is intended only for the person or entity whose address is
> > listed above. Any use of the information contained herein in any way
> > (including, but not limited to, total or partial disclosure,
> > reproduction, or dissemination) by persons other than the intended
> > recipient(s) is prohibited. If you receive this e-mail in error,
> > please notify the sender by phone or email immediately and delete it!
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Email:
[email protected]
[email protected]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to