[
https://issues.apache.org/jira/browse/DRILL-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192477#comment-16192477
]
ASF GitHub Bot commented on DRILL-5839:
---------------------------------------
GitHub user ppadma opened a pull request:
https://github.com/apache/drill/pull/974
DRILL-5839: Handle Empty Batches in Merge Receiver
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ppadma/drill DRILL-5839
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/974.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #974
----
commit ae259534d5ebabfe7f64e170012bea96fd655943
Author: Padma Penumarthy <[email protected]>
Date: 2017-10-01T00:33:17Z
DRILL-5839: Handle Empty Batches in Merge Receiver
----
> Handle Empty Batches in Merge Receiver
> --------------------------------------
>
> Key: DRILL-5839
> URL: https://issues.apache.org/jira/browse/DRILL-5839
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.11.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Fix For: 1.12.0
>
>
> merge receiver throws an exception when it receives first batch as empty
> batch (no rows and no schema) from any of the senders. Problem is that the
> operator expects at least one batch with schema (0 rows is ok, 0 columns is
> not) from each of its senders.
> The way algorithm works is as follows:
> Get the first batch from each of the senders.
> Create hyper vector container with this first batch from each of the senders.
> Add the batches from senders to the priority queue
> Pop from priority queue, get the index for the current batch from that
> sender,
> and use that to copy from the hyper vector to the outgoing vector
> When the end of batch from a sender is reached, load the next batch from the
> sender.
> Stop when there are no more batches from any of the senders.
> If any of the senders do not send first batch with schema and if we skip
> adding that batch to the hyper vector, hyper vector is not setup correctly
> and all the offsets from selection vector to individual batches from senders
> with in the hyper vector are messed up.
> Fix for this problem is when we receive empty batch from any of the senders,
> create dummy batch with schema from one of the other senders and add it to
> the hyper vector.
> If all senders send empty first batches, we just return NONE to downstream
> operator.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)