Hi Ben, I had already addressed all the comments. I was trying to figure out a way to reproduce the issue with a failing test before posting a new diff. I am sorry for the delay since it took me some time to figure out concepts like containerization etc. which were used in the test corresponding to disk usage.
I understand the point you are trying to make. However, this contradicts a statement which you made earlier. If we want the number of directories to affect the garbage collection "locally" then the only way is to re-order the sequence of directories which were scheduled for GC by making their delay a function of the number of executor directories. I will post a detailed solution (if I am able to figure out one) on the ticket before delving into the code next time. In the meanwhile, I will take up simpler issues to solve. Thanks for all the help and your time. Best Regards, On 4 March 2015 at 08:06, Benjamin Mahler <[email protected]> wrote: > Hey Ritwik, please hold off dealing with review comments. The current > approach here still seems to be not quite what we want. > > In the current patch, you're speeding up garbage collection "globally" > when an individual directory has too many child directories. This approach > works fine with "disk usage" as it is a global constraint we must deal > with. However, the "link count" is a per-directory constraint. This means > we want to be able to purge on a per-directory basis rather than globally. > > I apologize that I haven't had time to provide specific implementation > guidance and it's looking like this isn't a "newbie" ticket after all. > Bernd is pitching some time in here to help. In the meantime you might want > to pick up some other tickets that will require less shepherding from us! > > On Sat, Feb 28, 2015 at 4:54 AM, Bernd Mathiske <[email protected]> > wrote: > >> This is an automatically generated e-mail. To reply, visit: >> https://reviews.apache.org/r/30014/ >> >> First pass. >> >> >> src/slave/constants.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880941#file880941line64> >> (Diff >> revision 5) >> >> extern const Duration REGISTRATION_BACKOFF_FACTOR; >> >> 64 >> >> extern const double HARD_LINKS_THRESHOLD; >> >> The other constants have a comment. Why not this one? >> >> >> src/slave/flags.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line158> >> (Diff >> revision 5) >> >> public: >> >> 158 >> >> " 0.0, min((1.0 - gc_disk_headroom - disk usage),\n" >> >> Danger: is "-" left-associative here or right-associative? Better not to >> make any assumptions and put extra parentheses. >> >> It seems to me that the first expression is meant to be left-associative and >> the second right-associative. See more discussion why this is so below. >> >> Where is max_framework_link_usage defined? This comes out of context here. >> >> >> src/slave/flags.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line170> >> (Diff >> revision 5) >> >> public: >> >> 170 >> >> " 0.0, min((1.0 - gc_disk_headroom - disk usage),\n" >> >> Same as above. >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line237> >> (Diff >> revision 5) >> >> public: >> >> 236 >> >> // Garbage collects the directories based on the current disk usage. >> >> 237 >> >> // Garbage collects the directories based on the current disk usage and >> number >> >> It is probably worth mentioning here that you mean the maximum current >> number of executor directories in any active framework. >> >> While you are changing this sentence here, could you maybe also improve on >> the phrasing "the directories"? Using the definite article "the" without >> referring to anything in context makes it unclear to the reader what >> directories are meant. >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line242> >> (Diff >> revision 5) >> >> public: >> >> 240 >> >> void _checkDiskUsage(const process::Future<double>& usage); >> >> 242 >> >> void _checkGCParameters(const process::Future<GCParameters>& gc_params); >> >> >> 1. >> >> We don't use snake case, we use camel case for variable naming. Please >> also correct all other variables in this review. >> 2. >> >> No need to abbreviate "parameters" to "params". >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line639> >> (Diff >> revision 5) >> >> 639 >> >> // Getters >> >> Redundant comment. >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line640> >> (Diff >> revision 5) >> >> 640 >> >> double getDiskUsage() const; >> >> Why use getters and setters here at all? What is the benefit if all the >> setters are public? >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line643> >> (Diff >> revision 5) >> >> 643 >> >> // Setters >> >> Redundant comment. >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line648> >> (Diff >> revision 5) >> >> 648 >> >> // 'disk_usage' denotes a fraction of the disk that is currently being used >> >> "a fraction"? What other fractions are there? Do you mean "the fraction"? >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line649> >> (Diff >> revision 5) >> >> 649 >> >> // by the slave process. This usage is defined as a fraction of a limit >> >> "This usage is defined as a fraction of a limit specified by a flag." >> >> This is too nebulous. Suggestion: reference/point to a place with the actual >> definition. >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line652> >> (Diff >> revision 5) >> >> 652 >> >> // expressed as a fraction of the limit on the maximum number of >> >> "the maximum number of subdirectories" >> >> Please add " in any given parent directory" to be extra clear. Otherwise one >> could possibly read this mistakenly as: >> >> "the maximum number of subdirectories anywhere in a disk volume". >> >> >> src/slave/slave.hpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line656> >> (Diff >> revision 5) >> >> 656 >> >> Try<double> disk_usage, max_link_usage; >> >> >> 1. >> >> Please declare these two members separately, with separate comments. >> 2. >> >> "max_link_usage"? This sounds like a limit, not like a current gauge >> which it is. Suggestion: "topLinkUsage". >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3596> >> (Diff >> revision 5) >> >> void Slave::registerExecutorTimeout( >> >> 3596 >> >> 0.0, std::min((1.0 - flags.gc_disk_headroom - >> gc_params.getDiskUsage()), >> >> The indentation in the next line makes this hard to read. I'd say line >> this up this way: >> >> return flags.gc_delay * std::max( >> 0.0, std::min((1.0 - ...), >> (1.0 - ...; >> >> Even better: use variables with good descriptive naming for intermediate >> expression results. Or break out the formluae into small functions, even if >> they do not occur in multiple places. >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3597> >> (Diff >> revision 5) >> >> void Slave::registerExecutorTimeout( >> >> 3597 >> >> (1.0 - flags.gc_disk_headroom - gc_params.getMaxLinkUsage()))); >> >> >> 1. This can't be right. Subtracting link usage from disk head room? >> >> Such a glitch is exactly why I would like to suggest writing a unit test >> later on. Otherwise we may have this feature implemented, but it may not be >> working at all. >> >> 1. Left or right-associative use of "-" intended here? Suggestion: use >> extra parentheses! It looks like left-associative is correct here, but it >> won't be once you have replaced gc_disk_headroom with hard_links_threshold >> (hardLinksThreshold)... Then you have >> >> (1.0 - 0.8) - maxLinkUsage // seems wrong >> >> Proposal: Make these definitions more parallel in meaning. Use a headroom >> concept for hard links as well or change disk headroom to a disk usage >> threshold. I'd opt for the former, since it would touch less code. >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3637> >> (Diff >> revision 5) >> >> 3637 >> >> GCParameters params; >> >> params -> parameters >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3638> >> (Diff >> revision 5) >> >> 3638 >> >> params.setDiskUsage(fs::usage(flags.work_dir)); >> >> insert blank line above and below this line >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3640> >> (Diff >> revision 5) >> >> 3640 >> >> std::string executors_parent_path = paths::getExecutorsParentPath( >> >> std::string -> string >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3642> >> (Diff >> revision 5) >> >> 3642 >> >> struct stat s; >> >> insert blank line above this one and swap the two declarations, because >> lstat uses s, not limit. >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3645> >> (Diff >> revision 5) >> >> 3645 >> >> params.setMaxLinkUsage(ErrnoError( >> >> Hm. Is it really the best strategy to report nothing at all just because >> we are unsure about one single framweork? >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3650> >> (Diff >> revision 5) >> >> 3650 >> >> limit = ::pathconf(executors_parent_path.c_str(), _PC_LINK_MAX); >> >> Why not declare "limit" here? Why declare it in an outer scope and >> needlessly initialize it there? >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3651> >> (Diff >> revision 5) >> >> 3651 >> >> if (limit < 0 && errno != 0) { >> >> What about "limit < 0" and "errno == 0"? This is a possible response from >> pathconf. Then you keep going, dividing by zero in line 3656. >> >> Did you mean this? >> >> "limit < 0 || errno != 0" >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3656> >> (Diff >> revision 5) >> >> 3656 >> >> double link_usage_fraction = 1.0 * s.st_nlink / limit; >> >> What if we are on a system where the limit gets ignored and mkdir still >> works and GC for some reason has not caught up yet? Do we get usages above >> 1.0 then? Is this expected and handled well in therest of the code? >> >> >> src/slave/slave.cpp >> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3658> >> (Diff >> revision 5) >> >> 3658 >> >> std::max(params.getMaxLinkUsage(), link_usage_fraction)); >> >> Won't this crash if we have set maxLinkUsage to Error in line 3645? >> >> >> . >> >> - Bernd Mathiske >> >> On February 27th, 2015, 2:48 p.m. PST, Ritwik Yadav wrote: >> Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone. >> By Ritwik Yadav. >> >> *Updated Feb. 27, 2015, 2:48 p.m.* >> *Bugs: * MESOS-391 <https://issues.apache.org/jira/browse/MESOS-391> >> *Repository: * mesos >> Description >> >> A fix to enable the slave garbage collector to take the number of hard links >> (st_nlinks) into account when creating new directories. >> >> Testing >> >> Mesos builds sucessfully and all existing tests pass. >> >> Diffs >> >> - src/slave/constants.hpp (fd1c1aba0aa62372ab399bee5709ce81b8e92cec) >> - src/slave/constants.cpp (2a99b1105af0e2d62b956fa96988f477937a46bd) >> - src/slave/flags.hpp (56b25caf3901b38bdecb50310e8bcae0b114efa8) >> - src/slave/paths.hpp (1618439d728ded347ec75317ce8dd998acd7ee94) >> - src/slave/paths.cpp (01ea856aa2e628d4aee5fd31f7e49d147f740e8f) >> - src/slave/slave.hpp (7b58cade9ada5f0a56b4c60c502eb907ccae33b4) >> - src/slave/slave.cpp (9f31fa46304398e8f87b41b55d8f4cfd4aba10b9) >> - src/tests/gc_tests.cpp (deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf) >> >> View Diff <https://reviews.apache.org/r/30014/diff/> >> > > -- *Ritwik Yadav* Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur. Cell: +91-9635-152346 Twitter: @iRitwik <https://twitter.com/#%21/iRitwik>
