It's not clear if we still want to ensure GC events are ordered globally.
We might want to break that order if we have per-item constraints to deal
with (e.g. link count).

Can you elaborate on the delay approach you're mentioning?

On Wed, Mar 4, 2015 at 9:30 AM, Ritwik <[email protected]> wrote:

> Hi Ben,
>
> I had already addressed all the comments. I was trying to figure out a way
> to reproduce the issue with a failing test before posting a new diff. I am
> sorry for the delay since it took me some time to figure out concepts like
> containerization etc. which were used in the test corresponding to disk
> usage.
>
> I understand the point you are trying to make. However, this contradicts a
> statement which you made earlier. If we want the number of directories to
> affect the garbage collection "locally" then the only way is to re-order
> the sequence of directories which were scheduled for GC by making their
> delay a function of the number of executor directories. I will post a
> detailed solution (if I am able to figure out one) on the ticket before
> delving into the code next time.
>
> In the meanwhile, I will take up simpler issues to solve.
>
> Thanks for all the help and your time.
>
> Best Regards,
>
> On 4 March 2015 at 08:06, Benjamin Mahler <[email protected]>
> wrote:
>
>> Hey Ritwik, please hold off dealing with review comments. The current
>> approach here still seems to be not quite what we want.
>>
>> In the current patch, you're speeding up garbage collection "globally"
>> when an individual directory has too many child directories. This approach
>> works fine with "disk usage" as it is a global constraint we must deal
>> with. However, the "link count" is a per-directory constraint. This means
>> we want to be able to purge on a per-directory basis rather than globally.
>>
>> I apologize that I haven't had time to provide specific implementation
>> guidance and it's looking like this isn't a "newbie" ticket after all.
>> Bernd is pitching some time in here to help. In the meantime you might want
>> to pick up some other tickets that will require less shepherding from us!
>>
>> On Sat, Feb 28, 2015 at 4:54 AM, Bernd Mathiske <[email protected]>
>> wrote:
>>
>>>    This is an automatically generated e-mail. To reply, visit:
>>> https://reviews.apache.org/r/30014/
>>>
>>> First pass.
>>>
>>>
>>>    src/slave/constants.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880941#file880941line64> 
>>> (Diff
>>> revision 5)
>>>
>>> extern const Duration REGISTRATION_BACKOFF_FACTOR;
>>>
>>>    64
>>>
>>> extern const double HARD_LINKS_THRESHOLD;
>>>
>>>   The other constants have a comment. Why not this one?
>>>
>>>
>>>    src/slave/flags.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line158> 
>>> (Diff
>>> revision 5)
>>>
>>> public:
>>>
>>>    158
>>>
>>>         "    0.0, min((1.0 - gc_disk_headroom - disk usage),\n"
>>>
>>>   Danger: is "-" left-associative here or right-associative? Better not to 
>>> make any assumptions and put extra parentheses.
>>>
>>> It seems to me that the first expression is meant to be left-associative 
>>> and the second right-associative. See more discussion why this is so below.
>>>
>>> Where is max_framework_link_usage defined? This comes out of context here.
>>>
>>>
>>>    src/slave/flags.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line170> 
>>> (Diff
>>> revision 5)
>>>
>>> public:
>>>
>>>    170
>>>
>>>         "    0.0, min((1.0 - gc_disk_headroom - disk usage),\n"
>>>
>>>   Same as above.
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line237> 
>>> (Diff
>>> revision 5)
>>>
>>> public:
>>>
>>>   236
>>>
>>>   // Garbage collects the directories based on the current disk usage.
>>>
>>> 237
>>>
>>>   // Garbage collects the directories based on the current disk usage and 
>>> number
>>>
>>>   It is probably worth mentioning here that you mean the maximum current 
>>> number of executor directories  in any active framework.
>>>
>>> While you are changing this sentence here, could you maybe also improve on 
>>> the phrasing "the directories"? Using the definite article "the" without 
>>> referring to anything in context makes it unclear to the reader what 
>>> directories are meant.
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line242> 
>>> (Diff
>>> revision 5)
>>>
>>> public:
>>>
>>>   240
>>>
>>>   void _checkDiskUsage(const process::Future<double>& usage);
>>>
>>> 242
>>>
>>>   void _checkGCParameters(const process::Future<GCParameters>& gc_params);
>>>
>>>
>>>    1.
>>>
>>>    We don't use snake case, we use camel case for variable naming. Please 
>>> also correct all other variables in this review.
>>>    2.
>>>
>>>    No need to abbreviate "parameters" to "params".
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line639> 
>>> (Diff
>>> revision 5)
>>>
>>> 639
>>>
>>>   // Getters
>>>
>>>   Redundant comment.
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line640> 
>>> (Diff
>>> revision 5)
>>>
>>> 640
>>>
>>>   double getDiskUsage() const;
>>>
>>>   Why use getters and setters here at all? What is the benefit if all the 
>>> setters are public?
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line643> 
>>> (Diff
>>> revision 5)
>>>
>>> 643
>>>
>>>   // Setters
>>>
>>>   Redundant comment.
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line648> 
>>> (Diff
>>> revision 5)
>>>
>>> 648
>>>
>>>   // 'disk_usage' denotes a fraction of the disk that is currently being 
>>> used
>>>
>>>   "a fraction"? What other fractions are there? Do you mean "the fraction"?
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line649> 
>>> (Diff
>>> revision 5)
>>>
>>> 649
>>>
>>>   // by the slave process. This usage is defined as a fraction of a limit
>>>
>>>   "This usage is defined as a fraction of a limit specified by a flag."
>>>
>>> This is too nebulous. Suggestion: reference/point to a place with the 
>>> actual definition.
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line652> 
>>> (Diff
>>> revision 5)
>>>
>>> 652
>>>
>>>   // expressed as a fraction of the limit on the maximum number of
>>>
>>>   "the maximum number of subdirectories"
>>>
>>> Please add " in any given parent directory" to be extra clear. Otherwise 
>>> one could possibly read this mistakenly as:
>>>
>>> "the maximum number of subdirectories anywhere in a disk volume".
>>>
>>>
>>>    src/slave/slave.hpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line656> 
>>> (Diff
>>> revision 5)
>>>
>>> 656
>>>
>>>   Try<double> disk_usage, max_link_usage;
>>>
>>>
>>>    1.
>>>
>>>    Please declare these two members separately, with separate comments.
>>>    2.
>>>
>>>    "max_link_usage"? This sounds like a limit, not like a current gauge 
>>> which it is. Suggestion: "topLinkUsage".
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3596> 
>>> (Diff
>>> revision 5)
>>>
>>> void Slave::registerExecutorTimeout(
>>>
>>>    3596
>>>
>>>       0.0, std::min((1.0 - flags.gc_disk_headroom - 
>>> gc_params.getDiskUsage()),
>>>
>>>   The indentation in the next line makes this hard to read. I'd say line 
>>> this up this way:
>>>
>>> return flags.gc_delay * std::max(
>>>       0.0, std::min((1.0 - ...),
>>>                     (1.0 - ...;
>>>
>>> Even better: use variables with good descriptive naming for intermediate 
>>> expression results. Or break out the formluae into small functions, even if 
>>> they do not occur in multiple places.
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3597> 
>>> (Diff
>>> revision 5)
>>>
>>> void Slave::registerExecutorTimeout(
>>>
>>>    3597
>>>
>>>           (1.0 - flags.gc_disk_headroom - gc_params.getMaxLinkUsage())));
>>>
>>>
>>>    1. This can't be right. Subtracting link usage from disk head room?
>>>
>>> Such a glitch is exactly why I would like to suggest writing a unit test 
>>> later on. Otherwise we may have this feature implemented, but it may not be 
>>> working at all.
>>>
>>>    1. Left or right-associative use of "-" intended here? Suggestion: use 
>>> extra parentheses! It looks like left-associative is correct here, but it 
>>> won't be once you have replaced gc_disk_headroom with hard_links_threshold 
>>> (hardLinksThreshold)... Then you have
>>>
>>> (1.0 - 0.8) - maxLinkUsage // seems wrong
>>>
>>> Proposal: Make these definitions more parallel in meaning. Use a headroom 
>>> concept for hard links as well or change disk headroom to a disk usage 
>>> threshold. I'd opt for the former, since it would touch less code.
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3637> 
>>> (Diff
>>> revision 5)
>>>
>>> 3637
>>>
>>>   GCParameters params;
>>>
>>>   params -> parameters
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3638> 
>>> (Diff
>>> revision 5)
>>>
>>> 3638
>>>
>>>   params.setDiskUsage(fs::usage(flags.work_dir));
>>>
>>>   insert blank line above and below this line
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3640> 
>>> (Diff
>>> revision 5)
>>>
>>> 3640
>>>
>>>     std::string executors_parent_path = paths::getExecutorsParentPath(
>>>
>>>   std::string -> string
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3642> 
>>> (Diff
>>> revision 5)
>>>
>>> 3642
>>>
>>>     struct stat s;
>>>
>>>   insert blank line above this one and swap the two declarations, because 
>>> lstat uses s, not limit.
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3645> 
>>> (Diff
>>> revision 5)
>>>
>>> 3645
>>>
>>>       params.setMaxLinkUsage(ErrnoError(
>>>
>>>   Hm. Is it really the best strategy to report nothing at all just because 
>>> we are unsure about one single framweork?
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3650> 
>>> (Diff
>>> revision 5)
>>>
>>> 3650
>>>
>>>       limit = ::pathconf(executors_parent_path.c_str(), _PC_LINK_MAX);
>>>
>>>   Why not declare "limit" here? Why declare it in an outer scope and 
>>> needlessly initialize it there?
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3651> 
>>> (Diff
>>> revision 5)
>>>
>>> 3651
>>>
>>>       if (limit < 0 && errno != 0) {
>>>
>>>   What about "limit < 0" and "errno == 0"? This is a possible response from 
>>> pathconf. Then you keep going, dividing by zero in line 3656.
>>>
>>> Did you mean this?
>>>
>>> "limit < 0 || errno != 0"
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3656> 
>>> (Diff
>>> revision 5)
>>>
>>> 3656
>>>
>>>       double link_usage_fraction = 1.0 * s.st_nlink / limit;
>>>
>>>   What if we are on a system where the limit gets ignored and mkdir still 
>>> works and GC for some reason has not caught up yet? Do we get usages above 
>>> 1.0 then? Is this expected and handled well in therest of the code?
>>>
>>>
>>>    src/slave/slave.cpp
>>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3658> 
>>> (Diff
>>> revision 5)
>>>
>>> 3658
>>>
>>>           std::max(params.getMaxLinkUsage(), link_usage_fraction));
>>>
>>>   Won't this crash if we have set maxLinkUsage to Error in line 3645?
>>>
>>>
>>> .
>>>
>>> - Bernd Mathiske
>>>
>>> On February 27th, 2015, 2:48 p.m. PST, Ritwik Yadav wrote:
>>>   Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone.
>>> By Ritwik Yadav.
>>>
>>> *Updated Feb. 27, 2015, 2:48 p.m.*
>>>  *Bugs: * MESOS-391 <https://issues.apache.org/jira/browse/MESOS-391>
>>>  *Repository: * mesos
>>> Description
>>>
>>> A fix to enable the slave garbage collector to take the number of hard 
>>> links (st_nlinks) into account when creating new directories.
>>>
>>>   Testing
>>>
>>> Mesos builds sucessfully and all existing tests pass.
>>>
>>>   Diffs
>>>
>>>    - src/slave/constants.hpp (fd1c1aba0aa62372ab399bee5709ce81b8e92cec)
>>>    - src/slave/constants.cpp (2a99b1105af0e2d62b956fa96988f477937a46bd)
>>>    - src/slave/flags.hpp (56b25caf3901b38bdecb50310e8bcae0b114efa8)
>>>    - src/slave/paths.hpp (1618439d728ded347ec75317ce8dd998acd7ee94)
>>>    - src/slave/paths.cpp (01ea856aa2e628d4aee5fd31f7e49d147f740e8f)
>>>    - src/slave/slave.hpp (7b58cade9ada5f0a56b4c60c502eb907ccae33b4)
>>>    - src/slave/slave.cpp (9f31fa46304398e8f87b41b55d8f4cfd4aba10b9)
>>>    - src/tests/gc_tests.cpp (deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf)
>>>
>>> View Diff <https://reviews.apache.org/r/30014/diff/>
>>>
>>
>>
>
>
> --
> *Ritwik Yadav*
>
> Department of Computer Science and Engineering,
> Indian Institute of Technology,
> Kharagpur.
>
> Cell: +91-9635-152346
> Twitter: @iRitwik <https://twitter.com/#%21/iRitwik>
>

Reply via email to