> On Feb. 21, 2014, 12:12 a.m., Benjamin Hindman wrote:
> > 3rdparty/libprocess/3rdparty/stout/include/stout/interval.hpp, line 107
> > <https://reviews.apache.org/r/18093/diff/5/?file=499572#file499572line107>
> >
> >     IMHO, providing element iteration can be sneaky and cause unintended 
> > performance issues. For example, consider a using IntervalSet for a very 
> > large interval which turns iteration (or the construction of a vector from 
> > such iteration) into an extremely expensive loop or a huge memory 
> > footprint. If we can get away without element iteration that would be 
> > preferred (but passing IntervalSet everyplace we used to use vector) or at 
> > least providing "interval" iteration instead of "element" (but let's not do 
> > this unless we really have a need.
> 
> Jie Yu wrote:
>     Killed.
> 
> Ben Mahler wrote:
>     I see you've re-added element iteration because you're other patch needed 
> it. I would personally rather see Interval based iteration as we went through 
> earlier in this review to avoid the issues of iterating over large 
> IntervalSets:
>     
>     foreach (const Interval& interval, intervalSet) {
>       // Deal with interval, preferably not with all Bound combinations.
>     }
>     
>     This is safer than element iteration and would be sufficient for your 
> other patch, right?
> 
> Jie Yu wrote:
>     Yes. Without the element iteration, the code in src/log/catchup.cpp 
> (https://reviews.apache.org/r/18368/diff/#1.14) will become very ugly: you 
> probably need to build a vector from the foreach loop above and iterate over 
> that vector. That might blow up the memory if the interval is large. Using 
> element iterator at least will not blow up the memory.
>     
>     Since we've already made using element iteration very explicit, I prefer 
> leave it there for this case.
>     
>     Let me know your thoughts!
> 
> Ben Mahler wrote:
>     So the catchup code presumably is not dealing with a large set of 
> positions, otherwise iterating over all of them would be prohibitively 
> expensive? I'm ok with leaving element iteration given how explicit you've 
> made it. We could leave a TODO to implement Interval iteration instead but it 
> looks like your catchup code will be a bit messier if it has to deal with 
> Intervals coming from begin() / end().
> 
> Jie Yu wrote:
>     Per BenM's request, I am sketching here what the catchup code would look 
> like w/ and w/o element iteration:
>     
>     1) w/ element iteration:
>     
>     class BulkCatchUpProcess : public Process<BulkCatchUpProcess> {
>       ...
>       virtual void initialize() {
>         ...
>         it = positions.elementsBegin();
>         catchup();
>       }
>     
>       // Called before we try to catch-up a position.
>       void catchup() {
>         if (it == positions.elementsEnd()) {
>           promise.set(Nothing());
>           terminate(self());
>           return;
>         }
>         ...
>       }
>     
>       // Called after a position has been caught-up.
>       void succeeded() {
>         ++it;
>         ...
>       }
>       
>       const IntervalSet<uint64_t> positions;
>       IntervalSet<uint64_t>::element_const_iterator it;
>     };
>     
>     2) w/o element iteration. Assume that we have interval iteration and 
> standardized bounds (i.e., [begin, end) ).
>     
>     class BulkCatchUpProcess : public Process<BulkCatchUpProcess> {
>       ...
>       virtual void initialize() {
>         ...
>         it = positions.begin();
>         if (it != positions.end()) {
>           current = it->begin();
>         }
>     
>         catchup();
>       }
>     
>       // Called before we try to catch-up a position.
>       void catchup() {
>         if (current == it->end()) {
>           if (++it == positions.end()) {
>             promise.set(Nothing());
>             terminate(self());
>             return;
>           } else {
>             current = it->begin();
>           }
>         }
>         ...
>       }
>     
>       // Called after a position has been caught-up.
>       void succeeded() {
>         ++current;
>         ...
>       }
>       
>       const IntervalSet<uint64_t> positions;
>       IntervalSet<uint64_t>::iterator it;
>       uint64_t current;
>     };

Thanks! And some options for avoiding the need for IntervalSet in catchup:

1. Provide a cast operator for IntervalSet -> std::set so that catchup can take 
an std::set instead.
2. Provide a more explicit 'std::set elements() const' method for converting to 
std::set for passing in to catchup.

What I'm concerned about is that we're saying: use IntervalSet when dealing 
with possibly very large sets / ranges of numbers. The caveat being, be careful 
to avoid iterating over the elements since it might take a very long time! It 
seems like we should consider converting away from IntervalSet when we know the 
space is not large. For example, in catchup, why bother using an IntervalSet? 
It seems the only reason is that the caller wants to pass an IntervalSet in.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18093/#review35111
-----------------------------------------------------------


On Feb. 25, 2014, 1:08 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18093/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2014, 1:08 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Bugs: MESOS-993
>     https://issues.apache.org/jira/browse/MESOS-993
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/3rdparty/stout/Makefile.am 5d5a760 
>   3rdparty/libprocess/3rdparty/stout/include/stout/interval.hpp PRE-CREATION 
>   3rdparty/libprocess/3rdparty/stout/tests/interval_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18093/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> repeat 1000 times for the IntervalTest
> 
> Repeating all tests (iteration 100) . . .
> 
> Note: Google Test filter = *Interval*
> [==========] Running 7 tests from 1 test case.
> [----------] Global test environment set-up.
> [----------] 7 tests from IntervalTest
> [ RUN      ] IntervalTest.Interval
> [       OK ] IntervalTest.Interval (0 ms)
> [ RUN      ] IntervalTest.Addition
> [       OK ] IntervalTest.Addition (0 ms)
> [ RUN      ] IntervalTest.Subtraction
> [       OK ] IntervalTest.Subtraction (0 ms)
> [ RUN      ] IntervalTest.Intersection
> [       OK ] IntervalTest.Intersection (0 ms)
> [ RUN      ] IntervalTest.LargeInterval
> [       OK ] IntervalTest.LargeInterval (0 ms)
> [ RUN      ] IntervalTest.ElementIteration
> [       OK ] IntervalTest.ElementIteration (0 ms)
> [ RUN      ] IntervalTest.IntervalIteration
> [       OK ] IntervalTest.IntervalIteration (0 ms)
> [----------] 7 tests from IntervalTest (0 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 7 tests from 1 test case ran. (0 ms total)
> [  PASSED  ] 7 tests.
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Reply via email to