[
https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704007#comment-15704007
]
Paul Rogers commented on DRILL-5083:
------------------------------------
The looping call stack shown earlier calls from the {{BaseRootExec}} to the
{{RecordIterator}} on top of the {{MergeJoin}}. Notice that the normal
operators are missing: screen, etc. This means that
{{DeferredException.suppressingClose}} is working its way though the operator
list, calling {{close()}} as it goes. The {{close()}} for the screen has
already been called. This particular query also had a Project, a Writer and so
on; all of which seem to have been closed out.
It seems that the sequence is, roughly, that
* FragmentExecutor calls close on each operator in the tree
* Each operator calls next( ) on its input(s) to clear out incoming batches
* Without error checking a parent operator calls next() on its child (or
children)
This means that next() is called multiple times for some operators, creating
opportunities for havoc.
Proposed (partial, temporary) fix: create an error state in the base operator
class that is set on each exception, but which _is never cleared_.
When next() is called, check that state and do special clean-up code.
Unfortunately, the above means that we'll do the "have I failed" check for
billions of calls unnecessarily. Better (more permanent) solution would be to
have a separate clear( ) call that does any clean-up reads. Or, simply push the
work into close( ).
Much more investigation is needed, perhaps for each and every operator.
> IteratorValidator does not handle RecordIterator cleanup call to next( )
> ------------------------------------------------------------------------
>
> Key: DRILL-5083
> URL: https://issues.apache.org/jira/browse/DRILL-5083
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Priority: Minor
>
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something
> like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a
> UserException up the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
> // Check whether next() should even have been called in current state.
> if (null != exceptionState) {
> throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an
> IllegalStateException during cleanup.
> Seems the code should agree: if {{next( )}} will be called during cleanup,
> then {{next( )}} should gracefully handle that case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)