[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

ASF GitHub Bot (JIRA) Fri, 09 Mar 2018 11:18:05 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393397#comment-16393397
 ]


ASF GitHub Bot commented on DRILL-6125:
---------------------------------------

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1105#discussion_r173542042
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java
 ---
    @@ -227,11 +280,19 @@ public void run() {
             @Override
             public Void run() throws Exception {
               injector.injectChecked(fragmentContext.getExecutionControls(), 
"fragment-execution", IOException.class);
    -          /*
    -           * Run the query until root.next returns false OR we no longer 
need to continue.
    -           */
    -          while (shouldContinue() && root.next()) {
    -            // loop
    +
    +          while (shouldContinue()) {
    +            // Fragment is not cancelled
    +
    +            for (FragmentHandle fragmentHandle; (fragmentHandle = 
receiverFinishedQueue.poll()) != null;) {
    --- End diff --
    
    The original code allowed 
PartitionSenderRootExec#receivingFragmentFinished() to be called before, after 
or concurrently with PartitionSenderRootExec#innerNext(). The idea was that 
PartitionSenderRootExec#receivingFragmentFinished would terminate a partition. 
The terminate method of OutgoingRecordBatch would cause all outgoing records to 
a finished receiver to be dropped. So the finished downstream receivers would 
stop receiving data from the PartitionSenderRootExec, but the downstream 
receivers that did not finish would continue to receive records. There is a 
gray area here because I'm not sure why only some of the downstream receivers 
would send a RECEIVER_FINISHED message and others wouldn't, but the original 
design seems to make an assumption that this is a very common thing and is 
optimized for it. So I assume the original authors know something that we don't 
with respect to that.
    
    Also calling receivingFragmentFinished after we finished processing all the 
data would defeat the purpose, since the intention was to allow our fragment to 
terminate early if the downstream receivers decided they don't need anymore 
data (ex. A select limit query which only asks the first 100 rows of a result 
with 10 million rows). If we called it only after the next() loop was done we 
would always process all upstream records even when we didn't have to and we 
would see significant degradation in the performance of limit queries.
    



> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> --------------------------------------------------------------------------------
>
>                 Key: DRILL-6125
>                 URL: https://issues.apache.org/jira/browse/DRILL-6125
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Minor
>             Fix For: 1.14.0
>
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

Reply via email to