And - I just saw another recent thread - http://tracker.ceph.com/
issues/17177 - can be an explanation of most/all of the above ?

Next question(s) would then be:

   - How would one deal with duplicate stray(s)
   - How would one deal with mismatch between head items and
   fnode.fragstat, ceph daemon mds.foo scrub_path ?

-KJ

On Thu, Oct 6, 2016 at 5:05 PM, Kjetil Jørgensen <[email protected]>
wrote:

> Hi,
>
> context (i.e. what we're doing): We're migrating (or trying to) migrate
> off of an nfs server onto cephfs, for a workload that's best described as
> "big piles" of hardlinks. Essentially, we have a set of "sources":
> foo/01/<aa><rest-of-md5>
> foo/0b/<0b><rest-of-md5>
> .. and so on
> bar/02/..
> bar/0c/..
> .. and so on
>
> foo/bar/friends have been "cloned" numerous times to a set of names that
> over the course of weeks end up being recycled again, the clone is
> essentially cp -L foo copy-1-of-foo.
>
> We're doing "incremental" rsyncs of this onto cephfs, so the sense of "the
> original source of the hardlink" will end up moving around, depending on
> the whims of rsync. (if it matters, I found some allusion to "if the
> original file hardlinked is deleted, ...".
>
> For RBD the ceph cluster have mostly been rather well behaved, the
> problems we have had have for the most part been self-inflicted. Before
> introducing the hardlink spectacle to cephfs, the same filesystem were used
> for light-ish read-mostly loads, beint mostly un-eventful. (That being
> said, we did patch it for
>
> Cluster is v10.2.2 (mds v10.2.2+4d15eb12298e007744486e28924a6f0ae071bd06),
> clients are ubuntu's 4.4.0-32 kernel(s), and elrepo v4.4.4.
>
> The problems we're facing:
>
>    - Maybe a "non-problem" I have ~6M strays sitting around
>    - Slightly more problematic, I have duplicate stray(s) ? See log
>    excercepts below. Also; rados -p cephfs_metadata listomapkeys 60X.00000000
>    did/does seem to agree with there being duplicate strays (assuming
>    60X.00000000 is the directory indexes for the stray catalogs), caveat "not
>    a perfect snapshot", listomapkeys issued in serial fashion.
>    - We stumbled across (http://tracker.ceph.com/issues/17177 - mostly
>    here for more context)
>    - There's been a couple of instances of invalid backtrace(s), mostly
>    solved by either mds:scrub_path or just unlinking the files/directories in
>    question and re-rsync-ing.
>    - mismatch between head items and fnode.fragstat (See below for more
>    of the log excercept), appeared to have been solved by mds:scrub_path
>
>
> Duplicate stray(s), ceph-mds complains (a lot, during rsync):
> 2016-09-30 20:00:21.978314 7ffb653b8700  0 mds.0.cache.dir(603) _fetched
>  badness: got (but i already had) [inode 10003f25eaf [...2,head]
> ~mds0/stray0/10003f25eaf auth v38836572 s=8998 nl=5 n(v0 b8998 1=1+0)
> (iversion lock) 0x561082e6b520] mode 33188 mtime 2016-07-25 03:02:50.000000
> 2016-09-30 20:00:21.978336 7ffb653b8700 -1 log_channel(cluster) log [ERR]
> : loaded dup inode 10003f25eaf [2,head] v36792929 at
> ~mds0/stray3/10003f25eaf, but inode 10003f25eaf.head v38836572 already
> exists at ~mds0/stray0/10003f25eaf
>
> I briefly ran ceph-mds with debug_mds=20/20 which didn't yield anything
> immediately useful, beyond slightly-easier-to-follow the control-flow
> of src/mds/CDir.cc without becoming much wiser.
> 2016-09-30 20:43:51.910754 7ffb653b8700 20 mds.0.cache.dir(606) _fetched
> pos 310473 marker 'I' dname '100022e8617 [2,head]
> 2016-09-30 20:43:51.910757 7ffb653b8700 20 mds.0.cache.dir(606) lookup
> (head, '100022e8617')
> 2016-09-30 20:43:51.910759 7ffb653b8700 20 mds.0.cache.dir(606)   miss ->
> (10002a81c10,head)
> 2016-09-30 20:43:51.910762 7ffb653b8700  0 mds.0.cache.dir(606) _fetched
>  badness: got (but i already had) [inode 100022e8617 [...2,head]
> ~mds0/stray9/100022e8617 auth v39303851 s=11470 nl=10 n(v0 b11470 1=1+0)
> (iversion lock) 0x560c013904b8] mode 33188 mtime 2016-07-25 03:38:01.000000
> 2016-09-30 20:43:51.910772 7ffb653b8700 -1 log_channel(cluster) log [ERR]
> : loaded dup inode 100022e8617 [2,head] v39284583 at
> ~mds0/stray6/100022e8617, but inode 100022e8617.head v39303851 already
> exists at ~mds0/stray9/100022e8617
>
>
> 2016-09-25 06:23:50.947761 7ffb653b8700  1 mds.0.cache.dir(10003439a33)
> mismatch between head items and fnode.fragstat! printing dentries
> 2016-09-25 06:23:50.947779 7ffb653b8700  1 mds.0.cache.dir(10003439a33)
> get_num_head_items() = 36; fnode.fragstat.nfiles=53
> fnode.fragstat.nsubdirs=0
> 2016-09-25 06:23:50.947782 7ffb653b8700  1 mds.0.cache.dir(10003439a33)
> mismatch between child accounted_rstats and my rstats!
> 2016-09-25 06:23:50.947803 7ffb653b8700  1 mds.0.cache.dir(10003439a33)
> total of child dentrys: n(v0 b19365007 36=36+0)
> 2016-09-25 06:23:50.947806 7ffb653b8700  1 mds.0.cache.dir(10003439a33) my
> rstats:              n(v2 rc2016-08-28 04:48:37.685854 b49447206 53=53+0)
>
> The slightly sad thing is - I suspect all of this is probably from
> something that "happened at some time in the past", and running mds with
> debugging will make my users very unhappy as writing/formatting all that
> log is not exactly cheap. (debug_mds=20/20, quickly ended up with mds
> beacon marked as laggy).
>
> Bonus question: In terms of "understanding how cephfs works" is
> doc/dev/mds_internals it ? :) Given that making "minimal reproducible
> test-cases" so far is turning to be quite elusive from the "top down"
> approach, I'm finding myself looking inside the box to try to figure out
> how we got where we are.
>
> (And many thanks for ceph-dencoder, it satisfies my pathological need to
> look inside of things).
>
> Cheers,
> --
> Kjetil Joergensen <[email protected]>
> SRE, Medallia Inc
> Phone: +1 (650) 739-6580
>



-- 
Kjetil Joergensen <[email protected]>
SRE, Medallia Inc
Phone: +1 (650) 739-6580
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to