Another point, I don't see a functional benefit from avoiding a change of ownership for pass-through operators. Consider the following use-cases:
Example I - - Single batch of size 8MB is received at time t0 and then is passed through a set of pass-through operators - At time t1 owned by operator Opr1, time t2 owned by operator t2, and so forth - Assume we report memory usage at time t0 - t2; this is what will be seen - t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0) - t1: (fragment, opr-1, opr-2) = (0, 8MB, 0) - t2: (fragment, opr-1, opr-2) = (0, 0, 8MB) Example II - - Multiple batches of size 8MB are received at time t0 - t2 and then is passed through a set of pass-through operators - At time t1 owned by operator Opr1, time t2 owned by operator t2, and so forth - Assume we report memory usage at time t0 - t2; this is what will be seen - t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0) - t1: (fragment, opr-1, opr-2) = (8Mb, 8MB, 0) - t2: (fragment, opr-1, opr-2) = (8Mb, 8Mb, 8MB) The key thing is that we clarify our reporting metrics so that users do not make the wrong conclusions. Regards, Salim On Fri, Apr 27, 2018 at 11:47 PM, salim achouche <sachouc...@gmail.com> wrote: > Vlad, > > - My understanding is that operators need to take ownership of incoming > buffers (using > > the vector method transferTo()) > > - My view is not that receivers are pass-through; instead, I feel that > sender & receiver operators should focus on their business logic > > - It just happens that the unordered-receiver does very little > (deserializes the batch through the BatchLoader) > > - Contrast this with the merge-receiver which needs to consume data from > multiple inputs to provide ordered batches > > - The operator implementation will dictate how many batches are consumed > (this should have nothing to do with communication concerns) > > - Intricacies of buffering, acking, back-pressuring, etc is ideally left > to a communication module > > > My intent, is to consistently report on resource usage (I am fine if we > exclude pass-through operators as long as we do it consistently). The next > > enhancement that I am planning to do is to report on the fragment buffered > batches. This will enable us to account for such resources when analyzing > > memory usage. > > On Fri, Apr 27, 2018 at 9:50 PM, vrozov <g...@git.apache.org> wrote: > >> Github user vrozov commented on the issue: >> >> https://github.com/apache/drill/pull/1237 >> >> IMO, it will be good to understand what other operators do as well. >> For example what Project or Filter operators do. Do they take ownership of >> incoming batches? And if they do, when is the ownership taken? >> >> I do not suggest that we change how Sender and Receiver control >> **all** aspects of communication, at least not as part of this JIRA/PR. The >> difference in my and your approach is whether or not UnorderedReceiver and >> other receivers are pass-through operators. My view is that receivers are >> not pass-through operators and they are buffering operators as they receive >> batches from the network and buffer them before downstream operators are >> ready to consume those batches. In your view, receivers are pass-through >> operators that get batches from fragment queue or some other queue and pass >> them to downstream. As there is no wait and no processing between getting a >> batch from fragment queue and passing it to the next operator, I don't see >> why a receiver needs to take the ownership. >> >> >> --- >> > >