Hi Tim,
Here's a bit more background info from my experience.
As Sorabh said, the downstream operator has visibility to the incoming record
batch regardless of ownership.
If the downstream operator will retain the incoming vectors (buffers), then it
must take ownership of them. This is the case for pass-through operators such
as filters, or for buffering operators such as sort.
If the downstream operator will observe, but not reuse, the incoming data, then
ownership is (mostly) moot. Sounds like this is the case for HashJoin.
The key item I wish to add here is timing. Accepting ownership of the incoming
buffer shifts memory accounting from the incoming to the downstream operator.
As we have seem over and over, this can then cause the downstream operator to
suffer OOM errors if, say, the downstream operator then wants to allocate an
SV2 or some such, but finds itself over the memory limit.
I can imagine that, to avoid this issue, downstream operators may delay taking
ownership of the incoming vectors as long as possible. (The sort, for example,
first checks the incoming batch size, then spills as needed, to ensure that
there capacity in the operator allocator to accept ownership.)
Other operators may have decided to never take ownership if they will simply
observe, then discard incoming batches (or vectors.) One very likely candidate
for such an approach would be the SelectionVectorRemover since it copies data
from incoming to a new set of outgoing batches. (Have not checked if, in fact,
the SVR does avoid taking ownership.)
Would be great to adjust memory management so that we can use a more uniform
approach. As we've discussed multiple times, there is no good reason for us to
hold a gun to our own head with the current per-operator allocators, especially
when we issue an OOM for memory already located.
Perhaps you have some ideas for how to improve this as part of your resource
management work?
Thanks,
- Paul
On Friday, August 3, 2018, 12:39:35 AM PDT, Timothy Farkas
<[email protected]> wrote:
Thanks for the explanation Sorabh, that clears things up for me.
Tim
On Thu, Aug 2, 2018 at 11:38 PM, Sorabh Hamirwasia <[email protected]>
wrote:
> When next is called on upstream operator then 2 things can happen:
>
> 1. Current operator either work on incoming and produces/copy the
> records to its outgoing batch. In this case there is no transfer of
> ownership of incoming batch buffer from upstream to current operator
> allocator. Current operator will allocate separate memory footprint for
> its
> outgoing batch buffer. Also current operator is supposed to release the
> incoming batch buffer once its done working on it.
> 2. Current operator does a transfer of buffers from incoming batch value
> vectors to outgoing value vectors (like in Filter, limit (see [1]),
> etc).
> In this case ownership of buffers in incoming batch is transferred to
> current operator allocator.
>
> But I have seen different operator behaving differently. For Hash Join
> since join operators has to evaluate join condition for each probe side
> row, I don't think it will do any transfers. For build side it will build
> hash table on column involved in join condition but also has to store other
> columns if in projection list of query. So probably it might do transfer
> for those columns only (haven't looked into code though).
>
> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_drill_blob_006dc10a88c1708b793e3a38ac52a0
> 266bb07deb_exec_java-2Dexec_src_main_java_org_apache_
> drill_exec_physical_impl_limit_LimitRecordBatch.java-23L181&d=DwIBaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=4eQVr8zB8ZBff-yxTimdOQ&m=
> 3PkPHZrX33YZhmngIGbjYtgGn_KYzXNVUyJPEUQVFyo&s=7YyEpHwWOp2I8zQjNEdFXLu-
> QnVCq7neR7lyg7bfPVs&e=
>
> Thanks,
> Sorabh
>
> On Thu, Aug 2, 2018 at 9:43 PM, Timothy Farkas <[email protected]> wrote:
>
> > Hi All,
> >
> > What is the expected behavior for HashJoin when it calls next for its
> left
> > or right upstream record batches. Is ownership of an upstream
> > VectorContainer supposed to pass from from the left or right upstream
> > record batches to HashJoin immediately after a call to next? Or is
> > ownership of a VectorContainer supposed to stay with an upstream record
> > batch immediately after a call to next?
> >
> > Thanks,
> > Tim
> >
>