[ 
https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098806#comment-15098806
 ] 

ASF GitHub Bot commented on DRILL-3845:
---------------------------------------

Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/319#discussion_r49781604
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionerTemplate.java
 ---
    @@ -286,7 +286,7 @@ public void flush(boolean schemaChanged) throws 
IOException {
           //      sender has acknowledged the terminate request. After sending 
the last batch, all further batches are
           //      dropped.
           //   3. Partitioner thread is interrupted due to cancellation of 
fragment.
    -      final boolean isLastBatch = isLast || terminated || 
Thread.currentThread().isInterrupted();
    +      final boolean isLastBatch = isLast || 
Thread.currentThread().isInterrupted();
    --- End diff --
    
    That is true. We have 2 possible solutions to fix this problem:
    - either we change all receivers so they no longer wait for the last batch 
when it's an early termination
    - or we make sure the partition sender sends the last batch as soon as 
possible to avoid the case where it's sent too late.
    
    Like we discussed it, it's not easy to enforce the 1st solution as even the 
senders don't respect this , e.g. a single sender that receives an early 
termination message will close the fragment without letting it's receivers wait 
for the last batch.
    
    I will update the PR to implement the 2nd solution 


> PartitionSender doesn't send last batch for receivers that already terminated
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-3845
>                 URL: https://issues.apache.org/jira/browse/DRILL-3845
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Deneche A. Hakim
>            Assignee: Jacques Nadeau
>             Fix For: 1.5.0
>
>         Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill
>
>
> Even if a receiver has finished and informed the corresponding partition 
> sender, the sender will still try to send a "last batch" to the receiver when 
> it's done. In most cases this is fine as those batches will be silently 
> dropped by the receiving DataServer, but if a receiver has finished +10 
> minutes ago, DataServer will throw an exception as it couldn't find the 
> corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished 
> cache).
> DRILL-2274 is a reproduction for this case (after the corresponding fix is 
> applied).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to