ASF GitHub Bot commented on DRILL-6125:

Github user ilooner commented on the issue:

    @arina-ielchiieva @vrozov In light of Vlad's comments I have reworked the 
synchronization model yet again. This change now removes all synchronization 
from PartitionSenderRootExec and enforces the guarantee that all the lifecycle 
methods of the PartitionSenderRootExec will only be called by a single Run 
thread in the FragmentExecutor. Also while making this change I discovered a 
few other bugs with how cancellations and receiver finishes are handled, so I 
have addressed those bugs as well. I will go into more detail about what I 
changed below.
    # Motivation
    As Vlad pointed out **close** and **innerNext** are never called 
concurrently. After closer inspection of the code I also released that 
currently (in apache master) innerNext and close will always be called by the 
**FragmentExecutor#run** thread. The only method of PartitionSenderRootExec 
that is not called by the **FragmentExecutor#run** thread is 
**receivingFragmentFinished**. In order to simplify the implementation of 
PartitionSenderRootExec and also simplify the design of the FragmentExecutor I 
changed the code so that only the **FragmentExecutor#run** thread calls 
**receivingFragmentFinished** as well. In this way we can remove all the 
synchronization from PartitionSenderRootExec. This was done by by:
     1. Making the event processor save the FragmentHandle in the event that a 
receiver finish request was sent. 
     2. After the **root.next()** loop terminates in **FragmentExecutor#run** 
the eventProcessor is checked to see if a finish request was received. If so 
**receivingFragmentFinished** is called on root by the **FragmentExecutor#run** 
    # Other Bugs
    ## Processing of multiple termination requests
    The event processor would process all cancellation and finish requests, 
even if there is more than one. This doesn't really make sense, since we can 
only cancel or finish once. So I have changed the event processor to execute 
only the first termination request and ignore all the others.
    ## Simultaneous Cancellation and Finishing
    Since the event processor would process multiple termination requests 
concurrently it was possible for a cancel and a finish message to be received 
and processed simultaneously. The results of this were not well defined since 
**close** and **receivingFragmentFinished** could be called concurrently.
    # Other Improvements
    Vlad also pointed out that we did not need the hasCloseoutThread atomic 
reference, since we were already using the myThreadRef atomic reference. That 
cleaned up the code a bit.

> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> --------------------------------------------------------------------------------
>                 Key: DRILL-6125
>                 URL: https://issues.apache.org/jira/browse/DRILL-6125
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Minor
>             Fix For: 1.14.0
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.

This message was sent by Atlassian JIRA

Reply via email to