Hi Ben,

I had already addressed all the comments. I was trying to figure out a way
to reproduce the issue with a failing test before posting a new diff. I am
sorry for the delay since it took me some time to figure out concepts like
containerization etc. which were used in the test corresponding to disk
usage.

I understand the point you are trying to make. However, this contradicts a
statement which you made earlier. If we want the number of directories to
affect the garbage collection "locally" then the only way is to re-order
the sequence of directories which were scheduled for GC by making their
delay a function of the number of executor directories. I will post a
detailed solution (if I am able to figure out one) on the ticket before
delving into the code next time.

In the meanwhile, I will take up simpler issues to solve.

Thanks for all the help and your time.

Best Regards,

On 4 March 2015 at 08:06, Benjamin Mahler <[email protected]> wrote:

> Hey Ritwik, please hold off dealing with review comments. The current
> approach here still seems to be not quite what we want.
>
> In the current patch, you're speeding up garbage collection "globally"
> when an individual directory has too many child directories. This approach
> works fine with "disk usage" as it is a global constraint we must deal
> with. However, the "link count" is a per-directory constraint. This means
> we want to be able to purge on a per-directory basis rather than globally.
>
> I apologize that I haven't had time to provide specific implementation
> guidance and it's looking like this isn't a "newbie" ticket after all.
> Bernd is pitching some time in here to help. In the meantime you might want
> to pick up some other tickets that will require less shepherding from us!
>
> On Sat, Feb 28, 2015 at 4:54 AM, Bernd Mathiske <[email protected]>
> wrote:
>
>>    This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/30014/
>>
>> First pass.
>>
>>
>>    src/slave/constants.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880941#file880941line64> 
>> (Diff
>> revision 5)
>>
>> extern const Duration REGISTRATION_BACKOFF_FACTOR;
>>
>>    64
>>
>> extern const double HARD_LINKS_THRESHOLD;
>>
>>   The other constants have a comment. Why not this one?
>>
>>
>>    src/slave/flags.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line158> 
>> (Diff
>> revision 5)
>>
>> public:
>>
>>    158
>>
>>         "    0.0, min((1.0 - gc_disk_headroom - disk usage),\n"
>>
>>   Danger: is "-" left-associative here or right-associative? Better not to 
>> make any assumptions and put extra parentheses.
>>
>> It seems to me that the first expression is meant to be left-associative and 
>> the second right-associative. See more discussion why this is so below.
>>
>> Where is max_framework_link_usage defined? This comes out of context here.
>>
>>
>>    src/slave/flags.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880943#file880943line170> 
>> (Diff
>> revision 5)
>>
>> public:
>>
>>    170
>>
>>         "    0.0, min((1.0 - gc_disk_headroom - disk usage),\n"
>>
>>   Same as above.
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line237> 
>> (Diff
>> revision 5)
>>
>> public:
>>
>>   236
>>
>>   // Garbage collects the directories based on the current disk usage.
>>
>> 237
>>
>>   // Garbage collects the directories based on the current disk usage and 
>> number
>>
>>   It is probably worth mentioning here that you mean the maximum current 
>> number of executor directories  in any active framework.
>>
>> While you are changing this sentence here, could you maybe also improve on 
>> the phrasing "the directories"? Using the definite article "the" without 
>> referring to anything in context makes it unclear to the reader what 
>> directories are meant.
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line242> 
>> (Diff
>> revision 5)
>>
>> public:
>>
>>   240
>>
>>   void _checkDiskUsage(const process::Future<double>& usage);
>>
>> 242
>>
>>   void _checkGCParameters(const process::Future<GCParameters>& gc_params);
>>
>>
>>    1.
>>
>>    We don't use snake case, we use camel case for variable naming. Please 
>> also correct all other variables in this review.
>>    2.
>>
>>    No need to abbreviate "parameters" to "params".
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line639> 
>> (Diff
>> revision 5)
>>
>> 639
>>
>>   // Getters
>>
>>   Redundant comment.
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line640> 
>> (Diff
>> revision 5)
>>
>> 640
>>
>>   double getDiskUsage() const;
>>
>>   Why use getters and setters here at all? What is the benefit if all the 
>> setters are public?
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line643> 
>> (Diff
>> revision 5)
>>
>> 643
>>
>>   // Setters
>>
>>   Redundant comment.
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line648> 
>> (Diff
>> revision 5)
>>
>> 648
>>
>>   // 'disk_usage' denotes a fraction of the disk that is currently being used
>>
>>   "a fraction"? What other fractions are there? Do you mean "the fraction"?
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line649> 
>> (Diff
>> revision 5)
>>
>> 649
>>
>>   // by the slave process. This usage is defined as a fraction of a limit
>>
>>   "This usage is defined as a fraction of a limit specified by a flag."
>>
>> This is too nebulous. Suggestion: reference/point to a place with the actual 
>> definition.
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line652> 
>> (Diff
>> revision 5)
>>
>> 652
>>
>>   // expressed as a fraction of the limit on the maximum number of
>>
>>   "the maximum number of subdirectories"
>>
>> Please add " in any given parent directory" to be extra clear. Otherwise one 
>> could possibly read this mistakenly as:
>>
>> "the maximum number of subdirectories anywhere in a disk volume".
>>
>>
>>    src/slave/slave.hpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880946#file880946line656> 
>> (Diff
>> revision 5)
>>
>> 656
>>
>>   Try<double> disk_usage, max_link_usage;
>>
>>
>>    1.
>>
>>    Please declare these two members separately, with separate comments.
>>    2.
>>
>>    "max_link_usage"? This sounds like a limit, not like a current gauge 
>> which it is. Suggestion: "topLinkUsage".
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3596> 
>> (Diff
>> revision 5)
>>
>> void Slave::registerExecutorTimeout(
>>
>>    3596
>>
>>       0.0, std::min((1.0 - flags.gc_disk_headroom - 
>> gc_params.getDiskUsage()),
>>
>>   The indentation in the next line makes this hard to read. I'd say line 
>> this up this way:
>>
>> return flags.gc_delay * std::max(
>>       0.0, std::min((1.0 - ...),
>>                     (1.0 - ...;
>>
>> Even better: use variables with good descriptive naming for intermediate 
>> expression results. Or break out the formluae into small functions, even if 
>> they do not occur in multiple places.
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3597> 
>> (Diff
>> revision 5)
>>
>> void Slave::registerExecutorTimeout(
>>
>>    3597
>>
>>           (1.0 - flags.gc_disk_headroom - gc_params.getMaxLinkUsage())));
>>
>>
>>    1. This can't be right. Subtracting link usage from disk head room?
>>
>> Such a glitch is exactly why I would like to suggest writing a unit test 
>> later on. Otherwise we may have this feature implemented, but it may not be 
>> working at all.
>>
>>    1. Left or right-associative use of "-" intended here? Suggestion: use 
>> extra parentheses! It looks like left-associative is correct here, but it 
>> won't be once you have replaced gc_disk_headroom with hard_links_threshold 
>> (hardLinksThreshold)... Then you have
>>
>> (1.0 - 0.8) - maxLinkUsage // seems wrong
>>
>> Proposal: Make these definitions more parallel in meaning. Use a headroom 
>> concept for hard links as well or change disk headroom to a disk usage 
>> threshold. I'd opt for the former, since it would touch less code.
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3637> 
>> (Diff
>> revision 5)
>>
>> 3637
>>
>>   GCParameters params;
>>
>>   params -> parameters
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3638> 
>> (Diff
>> revision 5)
>>
>> 3638
>>
>>   params.setDiskUsage(fs::usage(flags.work_dir));
>>
>>   insert blank line above and below this line
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3640> 
>> (Diff
>> revision 5)
>>
>> 3640
>>
>>     std::string executors_parent_path = paths::getExecutorsParentPath(
>>
>>   std::string -> string
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3642> 
>> (Diff
>> revision 5)
>>
>> 3642
>>
>>     struct stat s;
>>
>>   insert blank line above this one and swap the two declarations, because 
>> lstat uses s, not limit.
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3645> 
>> (Diff
>> revision 5)
>>
>> 3645
>>
>>       params.setMaxLinkUsage(ErrnoError(
>>
>>   Hm. Is it really the best strategy to report nothing at all just because 
>> we are unsure about one single framweork?
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3650> 
>> (Diff
>> revision 5)
>>
>> 3650
>>
>>       limit = ::pathconf(executors_parent_path.c_str(), _PC_LINK_MAX);
>>
>>   Why not declare "limit" here? Why declare it in an outer scope and 
>> needlessly initialize it there?
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3651> 
>> (Diff
>> revision 5)
>>
>> 3651
>>
>>       if (limit < 0 && errno != 0) {
>>
>>   What about "limit < 0" and "errno == 0"? This is a possible response from 
>> pathconf. Then you keep going, dividing by zero in line 3656.
>>
>> Did you mean this?
>>
>> "limit < 0 || errno != 0"
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3656> 
>> (Diff
>> revision 5)
>>
>> 3656
>>
>>       double link_usage_fraction = 1.0 * s.st_nlink / limit;
>>
>>   What if we are on a system where the limit gets ignored and mkdir still 
>> works and GC for some reason has not caught up yet? Do we get usages above 
>> 1.0 then? Is this expected and handled well in therest of the code?
>>
>>
>>    src/slave/slave.cpp
>> <https://reviews.apache.org/r/30014/diff/5/?file=880947#file880947line3658> 
>> (Diff
>> revision 5)
>>
>> 3658
>>
>>           std::max(params.getMaxLinkUsage(), link_usage_fraction));
>>
>>   Won't this crash if we have set maxLinkUsage to Error in line 3645?
>>
>>
>> .
>>
>> - Bernd Mathiske
>>
>> On February 27th, 2015, 2:48 p.m. PST, Ritwik Yadav wrote:
>>   Review request for mesos, Ben Mahler, Dominic Hamon, and Vinod Kone.
>> By Ritwik Yadav.
>>
>> *Updated Feb. 27, 2015, 2:48 p.m.*
>>  *Bugs: * MESOS-391 <https://issues.apache.org/jira/browse/MESOS-391>
>>  *Repository: * mesos
>> Description
>>
>> A fix to enable the slave garbage collector to take the number of hard links 
>> (st_nlinks) into account when creating new directories.
>>
>>   Testing
>>
>> Mesos builds sucessfully and all existing tests pass.
>>
>>   Diffs
>>
>>    - src/slave/constants.hpp (fd1c1aba0aa62372ab399bee5709ce81b8e92cec)
>>    - src/slave/constants.cpp (2a99b1105af0e2d62b956fa96988f477937a46bd)
>>    - src/slave/flags.hpp (56b25caf3901b38bdecb50310e8bcae0b114efa8)
>>    - src/slave/paths.hpp (1618439d728ded347ec75317ce8dd998acd7ee94)
>>    - src/slave/paths.cpp (01ea856aa2e628d4aee5fd31f7e49d147f740e8f)
>>    - src/slave/slave.hpp (7b58cade9ada5f0a56b4c60c502eb907ccae33b4)
>>    - src/slave/slave.cpp (9f31fa46304398e8f87b41b55d8f4cfd4aba10b9)
>>    - src/tests/gc_tests.cpp (deaa6b1b6c32ae6d153229248d7d4f57caa0ebcf)
>>
>> View Diff <https://reviews.apache.org/r/30014/diff/>
>>
>
>


-- 
*Ritwik Yadav*

Department of Computer Science and Engineering,
Indian Institute of Technology,
Kharagpur.

Cell: +91-9635-152346
Twitter: @iRitwik <https://twitter.com/#%21/iRitwik>

Reply via email to