[ 
https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025562#comment-15025562
 ] 

Deneche A. Hakim commented on DRILL-3845:
-----------------------------------------

I changed the UnorderedReceiver to not kill it's providers until it receives 
the "last batch" (you can see the change 
[here|https://github.com/adeneche/incubator-drill/commit/5dbd9fdc88b1c802dff3509dee85416efa3dac15]
 but now, some queries will fail with the following error:
{noformat}
Error: SYSTEM ERROR: IllegalStateException: Cleanup before finished. 0 out of 1 
strams have finished
{noformat}

Fixing the receiver doesn't enforce the protocol. Senders will close their 
fragment as soon as they receive a "kill signal", causing their receivers to 
close before they get the "final batch", which throws the error above.

[~jnadeau] and [~sphillips]: is it valid to change the protocol such as 
receivers can terminate before they get their "final batch" (which is already 
the case sometimes) and senders don't send the "final batch" for receivers that 
already finished (they sent a "receiver finished" message) ?


> UnorderedReceiver shouldn't terminate until it receives a final batch
> ---------------------------------------------------------------------
>
>                 Key: DRILL-3845
>                 URL: https://issues.apache.org/jira/browse/DRILL-3845
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: 1.4.0
>
>         Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill
>
>
> Even if a receiver has finished and informed the corresponding partition 
> sender, the sender will still try to send a "last batch" to the receiver when 
> it's done. In most cases this is fine as those batches will be silently 
> dropped by the receiving DataServer, but if a receiver has finished +10 
> minutes ago, DataServer will throw an exception as it couldn't find the 
> corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished 
> cache).
> DRILL-2274 is a reproduction for this case (after the corresponding fix is 
> applied).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to