I missed this as a discussion since it had the title of a GitHub discussion. Comments below.
On Friday, April 27, 2018, 5:42:37 PM PDT, salim achouche <sachouc...@gmail.com> wrote: > Another point, I don't see a functional benefit from avoiding a change of ownership for pass-through operators. Please read my responses to Vlad. Change of ownership is critical to how Drill's memory allocators work today. Of course, you are right that, if we could do a new design (perhaps based on the budget-based approach), we would not need the ownership stuff. But, without ownership changes now, the existing allocators will simply cause us all manner of problems. In particular, none of the spill logic added to Sort or HashAgg would work as they rely on a properly-functioning allocator. > Consider the following use-cases: Example I - - Single batch of size 8MB is received at time t0 and then is passed through a set of pass-through operators - At time t1 owned by operator Opr1, time t2 owned by operator t2, and so forth - Assume we report memory usage at time t0 - t2; this is what will be seen - t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0) - t1: (fragment, opr-1, opr-2) = (0, 8MB, 0) - t2: (fragment, opr-1, opr-2) = (0, 0, 8MB) You are right. Each minor fragment is single-threaded: only one operator is "active" at a time as control passes from downstream to upstream operators. (Yes, this is the unfortunate Drill terminology: downstream calls upstream, data flows in the direction opposite to calls.) This single-threaded model is the insight behind the budget-based memory model. But, to get there, we must consider the whole system, we can't just make localized changes, unfortunately. > Example II - - Multiple batches of size 8MB are received at time t0 - t2 and then is passed through a set of pass-through operators - At time t1 owned by operator Opr1, time t2 owned by operator t2, and so forth - Assume we report memory usage at time t0 - t2; this is what will be seen - t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0) - t1: (fragment, opr-1, opr-2) = (8Mb, 8MB, 0) - t2: (fragment, opr-1, opr-2) = (8Mb, 8Mb, 8MB) The above can, AFAIK, never happen. A batch is owned by an operator, not a fragment. A batch passes up the operator tree until it reaches the top or until it reaches a "buffering" operator such as Sort. > The key thing is that we clarify our reporting metrics so that users do not make the wrong conclusions. This is a good thing. But, we need to understand how the batches flow and report that accurately. Further, we must deeply understand this flow if we want to move to budget-based allocation without per-operator allocators. Let's separate various concepts. First is the instantaneous "stats" maintained by each operator allocator to enforce memory limits. Second is the total data that has passed through an operator. Third is the maximum memory used at any one time over the life of the operator. These are all very useful, but they measure different things. Thanks, - Paul