Re: [ceph-users] Help with file system with failed mds daemon

Bryan Banister Tue, 22 Aug 2017 12:50:02 -0700

Hi John,



Seems like you're right... strange that it seemed to work with only one mds 
before I shut the cluster down.  Here is the `ceph fs get` output for the two 
file systems:



[root@carf-ceph-osd15 ~]# ceph fs get carf_ceph_kube01

Filesystem 'carf_ceph_kube01' (2)

fs_name carf_ceph_kube01

epoch   22

flags   8

created 2017-08-21 12:10:57.948579

modified        2017-08-21 12:10:57.948579

tableserver     0

root    0

session_timeout 60

session_autoclose       300

max_file_size   1099511627776

last_failure    0

last_failure_osd_epoch  1218

compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=file layout v2}

max_mds 1

in      0

up      {}

failed  0

damaged

stopped

data_pools      [23]

metadata_pool   24

inline_data     disabled

balancer

standby_count_wanted    0

[root@carf-ceph-osd15 ~]#

[root@carf-ceph-osd15 ~]# ceph fs get carf_ceph02

Filesystem 'carf_ceph02' (1)

fs_name carf_ceph02

epoch   26

flags   8

created 2017-08-18 14:20:50.152054

modified        2017-08-18 14:20:50.152054

tableserver     0

root    0

session_timeout 60

session_autoclose       300

max_file_size   1099511627776

last_failure    0

last_failure_osd_epoch  1198

compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=file layout v2}

max_mds 1

in      0

up      {0=474299}

failed

damaged

stopped

data_pools      [21]

metadata_pool   22

inline_data     disabled

balancer

standby_count_wanted    0

474299: 7.128.13.69:6800/304042158 'carf-ceph-osd15' mds.0.23 up:active seq 5



I also looked into trying to specify the mds_namespace option to the mount 
operation (http://docs.ceph.com/docs/master/cephfs/kernel/) but that doesn’t 
seem to be valid:

[ceph-admin@carf-ceph-osd04 ~]$ sudo mount -t ceph carf-ceph-osd15:6789:/ 
/mnt/carf_ceph02/ -o 
mds_namespace=carf_ceph02,name=cephfs.k8test,secretfile=k8test.secret

mount error 22 = Invalid argument



Thanks,

-Bryan



-----Original Message-----
From: John Spray [mailto:[email protected]]
Sent: Tuesday, August 22, 2017 11:18 AM
To: Bryan Banister <[email protected]>
Cc: [email protected]
Subject: Re: [ceph-users] Help with file system with failed mds daemon



Note: External Email

-------------------------------------------------



On Tue, Aug 22, 2017 at 4:58 PM, Bryan Banister

<[email protected]<mailto:[email protected]>> wrote:

> Hi all,

>

>

>

> I’m still new to ceph and cephfs.  Trying out the multi-fs configuration on

> at Luminous test cluster.  I shutdown the cluster to do an upgrade and when

> I brought the cluster back up I now have a warnings that one of the file

> systems has a failed mds daemon:

>

>

>

> 2017-08-21 17:00:00.000081 mon.carf-ceph-osd15 [WRN] overall HEALTH_WARN 1

> filesystem is degraded; 1 filesystem is have a failed mds daemon; 1 pools

> have many more objects per pg than average; application not enabled on 9

> pool(s)

>

>

>

> I tried restarting the mds service on the system and it doesn’t seem to

> indicate any problems:

>

> 2017-08-21 16:13:40.979449 7fffed8b0700  1 mds.0.20 shutdown: shutting down

> rank 0

>

> 2017-08-21 16:13:41.012167 7ffff7fde1c0  0 set uid:gid to 167:167

> (ceph:ceph)

>

> 2017-08-21 16:13:41.012180 7ffff7fde1c0  0 ceph version 12.1.4

> (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc), process (unknown),

> pid 16656

>

> 2017-08-21 16:13:41.014105 7ffff7fde1c0  0 pidfile_write: ignore empty

> --pid-file

>

> 2017-08-21 16:13:45.541442 7ffff10b7700  1 mds.0.23 handle_mds_map i am now

> mds.0.23

>

> 2017-08-21 16:13:45.541449 7ffff10b7700  1 mds.0.23 handle_mds_map state

> change up:boot --> up:replay

>

> 2017-08-21 16:13:45.541459 7ffff10b7700  1 mds.0.23 replay_start

>

> 2017-08-21 16:13:45.541466 7ffff10b7700  1 mds.0.23  recovery set is

>

> 2017-08-21 16:13:45.541475 7ffff10b7700  1 mds.0.23  waiting for osdmap 1198

> (which blacklists prior instance)

>

> 2017-08-21 16:13:45.565779 7fffea8aa700  0 mds.0.cache creating system inode

> with ino:0x100

>

> 2017-08-21 16:13:45.565920 7fffea8aa700  0 mds.0.cache creating system inode

> with ino:0x1

>

> 2017-08-21 16:13:45.571747 7fffe98a8700  1 mds.0.23 replay_done

>

> 2017-08-21 16:13:45.571751 7fffe98a8700  1 mds.0.23 making mds journal

> writeable

>

> 2017-08-21 16:13:46.542148 7ffff10b7700  1 mds.0.23 handle_mds_map i am now

> mds.0.23

>

> 2017-08-21 16:13:46.542149 7ffff10b7700  1 mds.0.23 handle_mds_map state

> change up:replay --> up:reconnect

>

> 2017-08-21 16:13:46.542158 7ffff10b7700  1 mds.0.23 reconnect_start

>

> 2017-08-21 16:13:46.542161 7ffff10b7700  1 mds.0.23 reopen_log

>

> 2017-08-21 16:13:46.542171 7ffff10b7700  1 mds.0.23 reconnect_done

>

> 2017-08-21 16:13:47.543612 7ffff10b7700  1 mds.0.23 handle_mds_map i am now

> mds.0.23

>

> 2017-08-21 16:13:47.543616 7ffff10b7700  1 mds.0.23 handle_mds_map state

> change up:reconnect --> up:rejoin

>

> 2017-08-21 16:13:47.543623 7ffff10b7700  1 mds.0.23 rejoin_start

>

> 2017-08-21 16:13:47.543638 7ffff10b7700  1 mds.0.23 rejoin_joint_start

>

> 2017-08-21 16:13:47.543666 7ffff10b7700  1 mds.0.23 rejoin_done

>

> 2017-08-21 16:13:48.544768 7ffff10b7700  1 mds.0.23 handle_mds_map i am now

> mds.0.23

>

> 2017-08-21 16:13:48.544771 7ffff10b7700  1 mds.0.23 handle_mds_map state

> change up:rejoin --> up:active

>

> 2017-08-21 16:13:48.544779 7ffff10b7700  1 mds.0.23 recovery_done --

> successful recovery!

>

> 2017-08-21 16:13:48.544924 7ffff10b7700  1 mds.0.23 active_start

>

> 2017-08-21 16:13:48.544954 7ffff10b7700  1 mds.0.23 cluster recovered.

>

>

>

> This seems like an easy problem to fix.  Any help is greatly appreciated!



I wonder if you have two filesystems but only one MDS?  Ceph will then

think that the second filesystem "has a failed MDS" because there

isn't an MDS online to service it.



John



>

> -Bryan

>

>

> ________________________________

>

> Note: This email is for the confidential use of the named addressee(s) only

> and may contain proprietary, confidential or privileged information. If you

> are not the intended recipient, you are hereby notified that any review,

> dissemination or copying of this email is strictly prohibited, and to please

> notify the sender immediately and destroy this email and any attachments.

> Email transmission cannot be guaranteed to be secure or error-free. The

> Company, therefore, does not make any guarantees as to the completeness or

> accuracy of this email or any attachments. This email is for informational

> purposes only and does not constitute a recommendation, offer, request or

> solicitation of any kind to buy, sell, subscribe, redeem or perform any type

> of transaction of a financial product.

>

> _______________________________________________

> ceph-users mailing list

> [email protected]<mailto:[email protected]>

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

________________________________

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help with file system with failed mds daemon

Reply via email to