And - I just saw another recent thread - http://tracker.ceph.com/ issues/17177 - can be an explanation of most/all of the above ?
Next question(s) would then be: - How would one deal with duplicate stray(s) - How would one deal with mismatch between head items and fnode.fragstat, ceph daemon mds.foo scrub_path ? -KJ On Thu, Oct 6, 2016 at 5:05 PM, Kjetil Jørgensen <[email protected]> wrote: > Hi, > > context (i.e. what we're doing): We're migrating (or trying to) migrate > off of an nfs server onto cephfs, for a workload that's best described as > "big piles" of hardlinks. Essentially, we have a set of "sources": > foo/01/<aa><rest-of-md5> > foo/0b/<0b><rest-of-md5> > .. and so on > bar/02/.. > bar/0c/.. > .. and so on > > foo/bar/friends have been "cloned" numerous times to a set of names that > over the course of weeks end up being recycled again, the clone is > essentially cp -L foo copy-1-of-foo. > > We're doing "incremental" rsyncs of this onto cephfs, so the sense of "the > original source of the hardlink" will end up moving around, depending on > the whims of rsync. (if it matters, I found some allusion to "if the > original file hardlinked is deleted, ...". > > For RBD the ceph cluster have mostly been rather well behaved, the > problems we have had have for the most part been self-inflicted. Before > introducing the hardlink spectacle to cephfs, the same filesystem were used > for light-ish read-mostly loads, beint mostly un-eventful. (That being > said, we did patch it for > > Cluster is v10.2.2 (mds v10.2.2+4d15eb12298e007744486e28924a6f0ae071bd06), > clients are ubuntu's 4.4.0-32 kernel(s), and elrepo v4.4.4. > > The problems we're facing: > > - Maybe a "non-problem" I have ~6M strays sitting around > - Slightly more problematic, I have duplicate stray(s) ? See log > excercepts below. Also; rados -p cephfs_metadata listomapkeys 60X.00000000 > did/does seem to agree with there being duplicate strays (assuming > 60X.00000000 is the directory indexes for the stray catalogs), caveat "not > a perfect snapshot", listomapkeys issued in serial fashion. > - We stumbled across (http://tracker.ceph.com/issues/17177 - mostly > here for more context) > - There's been a couple of instances of invalid backtrace(s), mostly > solved by either mds:scrub_path or just unlinking the files/directories in > question and re-rsync-ing. > - mismatch between head items and fnode.fragstat (See below for more > of the log excercept), appeared to have been solved by mds:scrub_path > > > Duplicate stray(s), ceph-mds complains (a lot, during rsync): > 2016-09-30 20:00:21.978314 7ffb653b8700 0 mds.0.cache.dir(603) _fetched > badness: got (but i already had) [inode 10003f25eaf [...2,head] > ~mds0/stray0/10003f25eaf auth v38836572 s=8998 nl=5 n(v0 b8998 1=1+0) > (iversion lock) 0x561082e6b520] mode 33188 mtime 2016-07-25 03:02:50.000000 > 2016-09-30 20:00:21.978336 7ffb653b8700 -1 log_channel(cluster) log [ERR] > : loaded dup inode 10003f25eaf [2,head] v36792929 at > ~mds0/stray3/10003f25eaf, but inode 10003f25eaf.head v38836572 already > exists at ~mds0/stray0/10003f25eaf > > I briefly ran ceph-mds with debug_mds=20/20 which didn't yield anything > immediately useful, beyond slightly-easier-to-follow the control-flow > of src/mds/CDir.cc without becoming much wiser. > 2016-09-30 20:43:51.910754 7ffb653b8700 20 mds.0.cache.dir(606) _fetched > pos 310473 marker 'I' dname '100022e8617 [2,head] > 2016-09-30 20:43:51.910757 7ffb653b8700 20 mds.0.cache.dir(606) lookup > (head, '100022e8617') > 2016-09-30 20:43:51.910759 7ffb653b8700 20 mds.0.cache.dir(606) miss -> > (10002a81c10,head) > 2016-09-30 20:43:51.910762 7ffb653b8700 0 mds.0.cache.dir(606) _fetched > badness: got (but i already had) [inode 100022e8617 [...2,head] > ~mds0/stray9/100022e8617 auth v39303851 s=11470 nl=10 n(v0 b11470 1=1+0) > (iversion lock) 0x560c013904b8] mode 33188 mtime 2016-07-25 03:38:01.000000 > 2016-09-30 20:43:51.910772 7ffb653b8700 -1 log_channel(cluster) log [ERR] > : loaded dup inode 100022e8617 [2,head] v39284583 at > ~mds0/stray6/100022e8617, but inode 100022e8617.head v39303851 already > exists at ~mds0/stray9/100022e8617 > > > 2016-09-25 06:23:50.947761 7ffb653b8700 1 mds.0.cache.dir(10003439a33) > mismatch between head items and fnode.fragstat! printing dentries > 2016-09-25 06:23:50.947779 7ffb653b8700 1 mds.0.cache.dir(10003439a33) > get_num_head_items() = 36; fnode.fragstat.nfiles=53 > fnode.fragstat.nsubdirs=0 > 2016-09-25 06:23:50.947782 7ffb653b8700 1 mds.0.cache.dir(10003439a33) > mismatch between child accounted_rstats and my rstats! > 2016-09-25 06:23:50.947803 7ffb653b8700 1 mds.0.cache.dir(10003439a33) > total of child dentrys: n(v0 b19365007 36=36+0) > 2016-09-25 06:23:50.947806 7ffb653b8700 1 mds.0.cache.dir(10003439a33) my > rstats: n(v2 rc2016-08-28 04:48:37.685854 b49447206 53=53+0) > > The slightly sad thing is - I suspect all of this is probably from > something that "happened at some time in the past", and running mds with > debugging will make my users very unhappy as writing/formatting all that > log is not exactly cheap. (debug_mds=20/20, quickly ended up with mds > beacon marked as laggy). > > Bonus question: In terms of "understanding how cephfs works" is > doc/dev/mds_internals it ? :) Given that making "minimal reproducible > test-cases" so far is turning to be quite elusive from the "top down" > approach, I'm finding myself looking inside the box to try to figure out > how we got where we are. > > (And many thanks for ceph-dencoder, it satisfies my pathological need to > look inside of things). > > Cheers, > -- > Kjetil Joergensen <[email protected]> > SRE, Medallia Inc > Phone: +1 (650) 739-6580 > -- Kjetil Joergensen <[email protected]> SRE, Medallia Inc Phone: +1 (650) 739-6580
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
