Hey Ritwik, please hold off dealing with review comments. The current approach here still seems to be not quite what we want.
In the current patch, you're speeding up garbage collection "globally" when an individual directory has too many child directories. This approach works fine with "disk usage" as it is a global constraint we must deal with. However, the "link count" is a per-directory constraint. This means we want to be able to purge on a per-directory basis rather than globally. I apologize that I haven't had time to provide specific implementation guidance and it's looking like this isn't a "newbie" ticket after all. Bernd is pitching some time in here to help. In the meantime you might want to pick up some other tickets that will require less shepherding from us! On Sat, Feb 28, 2015 at 4:54 AM, Bernd Mathiske <[email protected]> wrote: > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30014/ > > First pass. > > > src/slave/constants.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880941#file880941line64> > (Diff > revision 5) > > extern const Duration REGISTRATION_BACKOFF_FACTOR; > > 64 > > extern const double HARD_LINKS_THRESHOLD; > > The other constants have a comment. Why not this one? > > > src/slave/flags.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line158> > (Diff > revision 5) > > public: > > 158 > > " 0.0, min((1.0 - gc_disk_headroom - disk usage),\n" > > Danger: is "-" left-associative here or right-associative? Better not to > make any assumptions and put extra parentheses. > > It seems to me that the first expression is meant to be left-associative and > the second right-associative. See more discussion why this is so below. > > Where is max_framework_link_usage defined? This comes out of context here. > > > src/slave/flags.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line170> > (Diff > revision 5) > > public: > > 170 > > " 0.0, min((1.0 - gc_disk_headroom - disk usage),\n" > > Same as above. > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line237> > (Diff > revision 5) > > public: > > 236 > > // Garbage collects the directories based on the current disk usage. > > 237 > > // Garbage collects the directories based on the current disk usage and > number > > It is probably worth mentioning here that you mean the maximum current > number of executor directories in any active framework. > > While you are changing this sentence here, could you maybe also improve on > the phrasing "the directories"? Using the definite article "the" without > referring to anything in context makes it unclear to the reader what > directories are meant. > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line242> > (Diff > revision 5) > > public: > > 240 > > void _checkDiskUsage(const process::Future<double>& usage); > > 242 > > void _checkGCParameters(const process::Future<GCParameters>& gc_params); > > > 1. > > We don't use snake case, we use camel case for variable naming. Please > also correct all other variables in this review. > 2. > > No need to abbreviate "parameters" to "params". > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line639> > (Diff > revision 5) > > 639 > > // Getters > > Redundant comment. > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line640> > (Diff > revision 5) > > 640 > > double getDiskUsage() const; > > Why use getters and setters here at all? What is the benefit if all the > setters are public? > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line643> > (Diff > revision 5) > > 643 > > // Setters > > Redundant comment. > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line648> > (Diff > revision 5) > > 648 > > // 'disk_usage' denotes a fraction of the disk that is currently being used > > "a fraction"? What other fractions are there? Do you mean "the fraction"? > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line649> > (Diff > revision 5) > > 649 > > // by the slave process. This usage is defined as a fraction of a limit > > "This usage is defined as a fraction of a limit specified by a flag." > > This is too nebulous. Suggestion: reference/point to a place with the actual > definition. > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line652> > (Diff > revision 5) > > 652 > > // expressed as a fraction of the limit on the maximum number of > > "the maximum number of subdirectories" > > Please add " in any given parent directory" to be extra clear. Otherwise one > could possibly read this mistakenly as: > > "the maximum number of subdirectories anywhere in a disk volume". > > > src/slave/slave.hpp > <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line656> > (Diff > revision 5) > > 656 > > Try<double> disk_usage, max_link_usage; > > > 1. > > Please declare these two members separately, with separate comments. > 2. > > "max_link_usage"? This sounds like a limit, not like a current gauge which > it is. Suggestion: "topLinkUsage". > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3596> > (Diff > revision 5) > > void Slave::registerExecutorTimeout( > > 3596 > > 0.0, std::min((1.0 - flags.gc_disk_headroom - gc_params.getDiskUsage()), > > The indentation in the next line makes this hard to read. I'd say line this > up this way: > > return flags.gc_delay * std::max( > 0.0, std::min((1.0 - ...), > (1.0 - ...; > > Even better: use variables with good descriptive naming for intermediate > expression results. Or break out the formluae into small functions, even if > they do not occur in multiple places. > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3597> > (Diff > revision 5) > > void Slave::registerExecutorTimeout( > > 3597 > > (1.0 - flags.gc_disk_headroom - gc_params.getMaxLinkUsage()))); > > > 1. This can't be right. Subtracting link usage from disk head room? > > Such a glitch is exactly why I would like to suggest writing a unit test > later on. Otherwise we may have this feature implemented, but it may not be > working at all. > > 1. Left or right-associative use of "-" intended here? Suggestion: use > extra parentheses! It looks like left-associative is correct here, but it > won't be once you have replaced gc_disk_headroom with hard_links_threshold > (hardLinksThreshold)... Then you have > > (1.0 - 0.8) - maxLinkUsage // seems wrong > > Proposal: Make these definitions more parallel in meaning. Use a headroom > concept for hard links as well or change disk headroom to a disk usage > threshold. I'd opt for the former, since it would touch less code. > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3637> > (Diff > revision 5) > > 3637 > > GCParameters params; > > params -> parameters > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3638> > (Diff > revision 5) > > 3638 > > params.setDiskUsage(fs::usage(flags.work_dir)); > > insert blank line above and below this line > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3640> > (Diff > revision 5) > > 3640 > > std::string executors_parent_path = paths::getExecutorsParentPath( > > std::string -> string > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3642> > (Diff > revision 5) > > 3642 > > struct stat s; > > insert blank line above this one and swap the two declarations, because > lstat uses s, not limit. > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3645> > (Diff > revision 5) > > 3645 > > params.setMaxLinkUsage(ErrnoError( > > Hm. Is it really the best strategy to report nothing at all just because we > are unsure about one single framweork? > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3650> > (Diff > revision 5) > > 3650 > > limit = ::pathconf(executors_parent_path.c_str(), _PC_LINK_MAX); > > Why not declare "limit" here? Why declare it in an outer scope and > needlessly initialize it there? > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3651> > (Diff > revision 5) > > 3651 > > if (limit < 0 && errno != 0) { > > What about "limit < 0" and "errno == 0"? This is a possible response from > pathconf. Then you keep going, dividing by zero in line 3656. > > Did you mean this? > > "limit < 0 || errno != 0" > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3656> > (Diff > revision 5) > > 3656 > > double link_usage_fraction = 1.0 * s.st_nlink / limit; > > What if we are on a system where the limit gets ignored and mkdir still > works and GC for some reason has not caught up yet? Do we get usages above > 1.0 then? Is this expected and handled well in therest of the code? > > > src/slave/slave.cpp > <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3658> > (Diff > revision 5) > > 3658 > > std::max(params.getMaxLinkUsage(), link_usage_fraction)); > > Won't this crash if we have set maxLinkUsage to Error in line 3645? > > > . > > - Bernd Mathiske > > On February 27th, 2015, 2:48 p.m. PST, Ritwik Yadav wrote: > Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone. > By Ritwik Yadav. > > *Updated Feb. 27, 2015, 2:48 p.m.* > *Bugs: * MESOS-391 <https://issues.apache.org/jira/browse/MESOS-391> > *Repository: * mesos > Description > > A fix to enable the slave garbage collector to take the number of hard links > (st_nlinks) into account when creating new directories. > > Testing > > Mesos builds sucessfully and all existing tests pass. > > Diffs > > - src/slave/constants.hpp (fd1c1aba0aa62372ab399bee5709ce81b8e92cec) > - src/slave/constants.cpp (2a99b1105af0e2d62b956fa96988f477937a46bd) > - src/slave/flags.hpp (56b25caf3901b38bdecb50310e8bcae0b114efa8) > - src/slave/paths.hpp (1618439d728ded347ec75317ce8dd998acd7ee94) > - src/slave/paths.cpp (01ea856aa2e628d4aee5fd31f7e49d147f740e8f) > - src/slave/slave.hpp (7b58cade9ada5f0a56b4c60c502eb907ccae33b4) > - src/slave/slave.cpp (9f31fa46304398e8f87b41b55d8f4cfd4aba10b9) > - src/tests/gc_tests.cpp (deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf) > > View Diff <https://reviews.apache.org/r/30014/diff/> >
