[ 
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704007#comment-15704007
 ] 

Paul Rogers commented on DRILL-5083:
------------------------------------

The looping call stack shown earlier calls from the {{BaseRootExec}} to the 
{{RecordIterator}} on top of the {{MergeJoin}}. Notice that the normal 
operators are missing: screen, etc. This means that 
{{DeferredException.suppressingClose}} is working its way though the operator 
list, calling {{close()}} as it goes. The {{close()}} for the screen has 
already been called. This particular query also had a Project, a Writer and so 
on; all of which seem to have been closed out.

It seems that the sequence is, roughly, that

* FragmentExecutor calls close on each operator in the tree
* Each operator calls next( ) on its input(s) to clear out incoming batches
* Without error checking a parent operator calls next() on its child (or 
children)

This means that next() is called multiple times for some operators, creating 
opportunities for havoc.

Proposed (partial, temporary) fix: create an error state in the base operator 
class that is set on each exception, but which _is never cleared_.

When next() is called, check that state and do special clean-up code.

Unfortunately, the above means that we'll do the "have I failed" check for 
billions of calls unnecessarily. Better (more permanent) solution would be to 
have a separate clear( ) call that does any clean-up reads. Or, simply push the 
work into close( ).

Much more investigation is needed, perhaps for each and every operator.

> IteratorValidator does not handle RecordIterator cleanup call to next( )
> ------------------------------------------------------------------------
>
>                 Key: DRILL-5083
>                 URL: https://issues.apache.org/jira/browse/DRILL-5083
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something 
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a 
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>       // Check whether next() should even have been called in current state.
>       if (null != exceptionState) {
>         throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an 
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup, 
> then {{next( )}} should gracefully handle that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to