Re: [ceph-users] CEPHFS file or directories disappear when ls (metadata problem)

Lincoln Bryant Wed, 23 Mar 2016 09:19:34 -0700

Hi, 

If you are using the kernel client, I would suggest trying something newer than 
3.10.x. I ran into this issue in the past, but it was fixed by updating my 
kernel to something newer. You may want to check the OS recommendations page as 
well: http://docs.ceph.com/docs/master/start/os-recommendations/ 
<http://docs.ceph.com/docs/master/start/os-recommendations/>


ELRepo maintains mainline RPMs for EL6 and EL7: http://elrepo.org/tiki/kernel-ml

Alternatively, you could try the FUSE client.

—Lincoln

> On Mar 23, 2016, at 11:12 AM, FaHui Lin <[email protected]> wrote:
> 
> Dear Ceph experts,
> 
> We meet a nasty problem with our CephFS from time to time:
> 
> When we try to list a directory under CephFS, some files or directories do 
> not show up. For example:
> 
> This is the complete directory content:
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 100001 1559018781 Feb  2 07:43 dir-A
> drwxr-xr-x 1 10035 100001    9061906 Apr 15  2015 dir-B
> -rw-r--r-- 1 10035 100001  130750361 Aug  6  2015 file-1
> -rw-r--r-- 1 10035 100001   72640608 Apr 15  2015 file-2
> 
> But sometimes we get only part of files/directories when listing, say:
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 100001 1559018781 Feb  2 07:43 dir-A
> -rw-r--r-- 1 10035 100001   72640608 Apr 15  2015 file-2
> Here dir-B & file-1 missing.
> 
> We found the files themselves are still intact since we and still see them on 
> another node mounting the same cephfs, or just at another time. So we think 
> this is a metadata problem.
> 
> One thing we found interesting(?) is that remounting cephfs or restart MDS 
> service will NOT help, but creating a new file under the directory may help:
> 
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 100001 1559018781 Feb  2 07:43 dir-A
> -rw-r--r-- 1 10035 100001   72640608 Apr 15  2015 file-2
> # touch /cephfs/ies/home/mika/file-tmp
> # ll /cephfs/ies/home/mika
> drwxr-xr-x 1 10035 100001 1559018781 Feb  2 07:43 dir-A
> drwxr-xr-x 1 10035 100001    9061906 Apr 15  2015 dir-B
> -rw-r--r-- 1 10035 100001  130750361 Aug  6  2015 file-1
> -rw-r--r-- 1 10035 100001   72640608 Apr 15  2015 file-2
> -rw-r--r-- 1 root  root            0 Mar 23 15:34 file-tmp
> 
> 
> Strangely, when this happens, ceph cluster health usually shows HEALTH_OK, 
> and there's no significant errors in MDS or other service logs.
> 
> One thing we tried to improve is increasing MDS mds_cache_size to be 1600000 
> (16x default value), which does help to alleviate warnings like "mds0: Client 
> failing to respond to cache pressure", but still cannot solve the file 
> metadata missing problem.
> 
> Here's our ceph server info:
> 
> # ceph -s
>     cluster d15a2cdb-354c-4bcd-a246-23521f1a7122
>      health HEALTH_OK
>      monmap e1: 3 mons at 
> {as-ceph01=117.103.102.128:6789/0,as-ceph02=117.103.103.93:6789/0,as-ceph03=117.103.109.124:6789/0}
>             election epoch 6, quorum 0,1,2 as-ceph01,as-ceph02,as-ceph03
>      mdsmap e144: 1/1/1 up {0=as-ceph02=up:active}, 1 up:standby
>      osdmap e178: 10 osds: 10 up, 10 in
>             flags sortbitwise
>       pgmap v105168: 256 pgs, 4 pools, 505 GB data, 1925 kobjects
>             1083 GB used, 399 TB / 400 TB avail
>                  256 active+clean
>   client io 614 B/s rd, 0 op/s
> 
> # ceph --version
> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> 
> (We also met the same problem on Hammer release)
> 
> # uname -r
> 3.10.0-327.10.1.el7.x86_64
> 
> We're using centOS7 servers.
> 
> # ceph daemon mds.as-ceph02 perf dump
> {
>     "mds": {
>         "request": 76066,
>         "reply": 76066,
>         "reply_latency": {
>             "avgcount": 76066,
>             "sum": 61.151796797
>         },
>         "forward": 0,
>         "dir_fetch": 1050,
>         "dir_commit": 1017,
>         "dir_split": 0,
>         "inode_max": 1600000,
>         "inodes": 130657,
>         "inodes_top": 110882,
>         "inodes_bottom": 19775,
>         "inodes_pin_tail": 0,
>         "inodes_pinned": 99670,
>         "inodes_expired": 0,
>         "inodes_with_caps": 99606,
>         "caps": 105119,
>         "subtrees": 2,
>         "traverse": 81583,
>         "traverse_hit": 74090,
>         "traverse_forward": 0,
>         "traverse_discover": 0,
>         "traverse_dir_fetch": 24,
>         "traverse_remote_ino": 0,
>         "traverse_lock": 80,
>         "load_cent": 7606600,
>         "q": 0,
>         "exported": 0,
>         "exported_inodes": 0,
>         "imported": 0,
>         "imported_inodes": 0
>     },
>     "mds_cache": {
>         "num_strays": 120,
>         "num_strays_purging": 0,
>         "num_strays_delayed": 0,
>         "num_purge_ops": 0,
>         "strays_created": 17276,
>         "strays_purged": 17155,
>         "strays_reintegrated": 1,
>         "strays_migrated": 0,
>         "num_recovering_processing": 0,
>         "num_recovering_enqueued": 0,
>         "num_recovering_prioritized": 0,
>         "recovery_started": 0,
>         "recovery_completed": 0
>     },
>     "mds_log": {
>         "evadd": 116253,
>         "evex": 123148,
>         "evtrm": 123148,
>         "ev": 22378,
>         "evexg": 0,
>         "evexd": 17,
>         "segadd": 157,
>         "segex": 157,
>         "segtrm": 157,
>         "seg": 31,
>         "segexg": 0,
>         "segexd": 1,
>         "expos": 53624211952,
>         "wrpos": 53709306372,
>         "rdpos": 53354921818,
>         "jlat": 0
>     },
>     "mds_mem": {
>         "ino": 129334,
>         "ino+": 146489,
>         "ino-": 17155,
>         "dir": 3961,
>         "dir+": 4741,
>         "dir-": 780,
>         "dn": 130657,
>         "dn+": 163760,
>         "dn-": 33103,
>         "cap": 105119,
>         "cap+": 122281,
>         "cap-": 17162,
>         "rss": 444444,
>         "heap": 50108,
>         "malloc": 402511,
>         "buf": 0
>     },
>     "mds_server": {
>         "handle_client_request": 76066,
>         "handle_slave_request": 0,
>         "handle_client_session": 176954,
>         "dispatch_client_request": 80245,
>         "dispatch_server_request": 0
>     },
>     "objecter": {
>         "op_active": 0,
>         "op_laggy": 0,
>         "op_send": 61860,
>         "op_send_bytes": 0,
>         "op_resend": 0,
>         "op_ack": 7719,
>         "op_commit": 54141,
>         "op": 61860,
>         "op_r": 7719,
>         "op_w": 54141,
>         "op_rmw": 0,
>         "op_pg": 0,
>         "osdop_stat": 119,
>         "osdop_create": 26905,
>         "osdop_read": 21,
>         "osdop_write": 8537,
>         "osdop_writefull": 254,
>         "osdop_append": 0,
>         "osdop_zero": 1,
>         "osdop_truncate": 0,
>         "osdop_delete": 17325,
>         "osdop_mapext": 0,
>         "osdop_sparse_read": 0,
>         "osdop_clonerange": 0,
>         "osdop_getxattr": 7695,
>         "osdop_setxattr": 53810,
>         "osdop_cmpxattr": 0,
>         "osdop_rmxattr": 0,
>         "osdop_resetxattrs": 0,
>         "osdop_tmap_up": 0,
>         "osdop_tmap_put": 0,
>         "osdop_tmap_get": 0,
>         "osdop_call": 0,
>         "osdop_watch": 0,
>         "osdop_notify": 0,
>         "osdop_src_cmpxattr": 0,
>         "osdop_pgls": 0,
>         "osdop_pgls_filter": 0,
>         "osdop_other": 1111,
>         "linger_active": 0,
>         "linger_send": 0,
>         "linger_resend": 0,
>         "linger_ping": 0,
>         "poolop_active": 0,
>         "poolop_send": 0,
>         "poolop_resend": 0,
>         "poolstat_active": 0,
>         "poolstat_send": 0,
>         "poolstat_resend": 0,
>         "statfs_active": 0,
>         "statfs_send": 0,
>         "statfs_resend": 0,
>         "command_active": 0,
>         "command_send": 0,
>         "command_resend": 0,
>         "map_epoch": 178,
>         "map_full": 0,
>         "map_inc": 3,
>         "osd_sessions": 55,
>         "osd_session_open": 182,
>         "osd_session_close": 172,
>         "osd_laggy": 0,
>         "omap_wr": 1972,
>         "omap_rd": 2102,
>         "omap_del": 40
>     },
>     "throttle-msgr_dispatch_throttler-mds": {
>         "val": 0,
>         "max": 104857600,
>         "get": 450630,
>         "get_sum": 135500995,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 0,
>         "take_sum": 0,
>         "put": 450630,
>         "put_sum": 135500995,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000
>         }
>     },
>     "throttle-objecter_bytes": {
>         "val": 0,
>         "max": 104857600,
>         "get": 0,
>         "get_sum": 0,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 61860,
>         "take_sum": 453992030,
>         "put": 44433,
>         "put_sum": 453992030,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000
>         }
>     },
>     "throttle-objecter_ops": {
>         "val": 0,
>         "max": 1024,
>         "get": 0,
>         "get_sum": 0,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 61860,
>         "take_sum": 61860,
>         "put": 61860,
>         "put_sum": 61860,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000
>         }
>     }
> }
> 
> 
> This problem troubles us much since our cephfs is serving as a network shared 
> file-system of 100+ computing nodes (mounting with mount.ceph), and it causes 
> jobs running with I/O on cephfs to fail.
> 
> 
> I'd like to ask that:
> 
> 1) What could be the main cause of this problem? Or, how can we trace the 
> problem?
> However, we cannot really reproduce the problem on purpose. It just happens 
> occasionally.
> 
> 2) Since our cephfs is now for production usage, is there any comment for us 
> to improve the stability?
> We have 100+ computing nodes requiring a shared file-system containing tens 
> of millions of files and I wonder if the MDS server (only one) could handle 
> them well.
> Should we use ceph-fuse mount or ceph mount? Should we use only 3~5 servers 
> mounting cephfs and then share the mountpoint to other nodes with NFS, in 
> order to mitigate the loading of MDS server? What is a proper cluster 
> structure using cephfs?
> 
> Any advice or comment will be appreciated. Thank you.
> 
> Best Regards,
> FaHui
> 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPHFS file or directories disappear when ls (metadata problem)

Reply via email to