I missed this as a discussion since it had the title of a GitHub discussion. 
Comments below.

    On Friday, April 27, 2018, 5:42:37 PM PDT, salim achouche 
<sachouc...@gmail.com> wrote:  
 
 > Another point, I don't see a functional benefit from avoiding a change of
ownership for pass-through operators. 

Please read my responses to Vlad. Change of ownership is critical to how 
Drill's memory allocators work today. Of course, you are right that, if we 
could do a new design (perhaps based on the budget-based approach), we would 
not need the ownership stuff. But, without ownership changes now, the existing 
allocators will simply cause us all manner of problems. In particular, none of 
the spill logic added to Sort or HashAgg would work as they rely on a 
properly-functioning allocator.

> Consider the following use-cases:

Example I -
- Single batch of size 8MB is received at time t0 and then is passed
through a set of pass-through operators
- At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
forth
- Assume we report memory usage at time t0 - t2; this is what will be seen
- t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
- t1: (fragment, opr-1, opr-2) = (0, 8MB, 0)
- t2: (fragment, opr-1, opr-2) = (0, 0, 8MB)

You are right. Each minor fragment is single-threaded: only one operator is 
"active" at a time as control passes from downstream to upstream operators. 
(Yes, this is the unfortunate Drill terminology: downstream calls upstream, 
data flows in the direction opposite to calls.)
This single-threaded model is the insight behind the budget-based memory model. 
But, to get there, we must consider the whole system, we can't just make 
localized changes, unfortunately.

> Example II -
- Multiple batches of size 8MB are received at time t0 - t2 and then is
passed through a set of pass-through operators
- At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
forth
- Assume we report memory usage at time t0 - t2; this is what will be seen
- t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
- t1: (fragment, opr-1, opr-2) = (8Mb, 8MB, 0)
- t2: (fragment, opr-1, opr-2) = (8Mb, 8Mb, 8MB)

The above can, AFAIK, never happen. A batch is owned by an operator, not a 
fragment. A batch passes up the operator tree until it reaches the top or until 
it reaches a "buffering" operator such as Sort.


> The key thing is that we clarify our reporting metrics so that users do not
make the wrong conclusions.

This is a good thing. But, we need to understand how the batches flow and 
report that accurately. Further, we must deeply understand this flow if we want 
to move to budget-based allocation without per-operator allocators.

Let's separate various concepts. First is the instantaneous "stats" maintained 
by each operator allocator to enforce memory limits. Second is the total data 
that has passed through an operator. Third is the maximum memory used at any 
one time over the life of the operator.

These are all very useful, but they measure different things.

Thanks,
- Paul

  

Reply via email to