[
https://issues.apache.org/jira/browse/DRILL-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036417#comment-17036417
]
ASF GitHub Bot commented on DRILL-7583:
---------------------------------------
paul-rogers commented on pull request #1981: DRILL-7583: Remove STOP status
from operator outcome
URL: https://github.com/apache/drill/pull/1981
Removes the `IterOutcome.STOP` status and revises the `kill` methods.
## Description
Operators must handle both the "happy path" and failures. The "happy path"
is defined by the `RecordBatch.IterOutcome` enum: each operator's `next()`
method returns this outcome to describe what happened: batch, batch with new
schema, or EOF.
Operators originally handled the error path the same way: via an
`IterOutcome` of `STOP`. Error handling was somewhat complex:
* Set a failed flag in the operator,
* Tell upstream operators they have failed,
* Consuming data from the upstream until it returns `NONE`,
* Originally, sometimes calling `close()`,
* Returning a STOP status.
Each operator implemented some variation or subset of the above. Early on,
the error protocol was source of errors. I actually wrote a detailed analysis
of the issues several years ago, and have been chipping away at fixing the
problems ever since.
Today, the error path mostly works; is simply a source of highly complex
code. And, as it turns out, entirely unnecessary.
Drill has long supported a "fail-fast" error path based on throwing an
exception; relying on the fragment executor to clean up the operator stack. The
"fail-fast" system is simple (throw an exception) and has a uniform way of
cleaning up (call `close()` once on each operator.) Recent revisions have
converted most operators to use the simpler fail-fast strategy based on
throwing an exception instead of using the older STOP approach.
The careful reader of those PRs will have noted that, as a result, no code
returns the `STOP` status any longer. This PR goes to the next logical step
and removes the old, complex, and now-unused `STOP` based path.
There is a related mechanism in operators: the `kill()` method, and its
implementation, `killIncoming()`. The original purpose appears to be that step
above: when an operator fails, fail all its children. Things got messy when an
operator received a `STOP` status: should it call `kill()` on its child? On
itself? Lots of fun bugs to fix back in the day.
With `STOP` retired, the main purpose of the `kill()` method disappears.
There is, however, a second use that we retain: cancelling upstream operators
in the "normal" case. Think of the LIMIT operator: once a `LIMIT 10` has 10
rows, it need not ask for more. A Parquet reader, say, may still be busily
reading columns in worker threads. The `LIMIT` wants to tell its upstream
operators, "hey, I have all the rows I need, thanks. You can go ahead and stop
reading more data." That is, we need a "normal case" cancellation.
To make clear that `kill()` now does cancellation, this PR renames the
method to `cancel()`.
## Documentation
This is not a user-visible change, it is purely internal to the execution
engine. (The one user benefit is that error messages might be a bit more
precise because of earlier work.)
## Testing
Reran all unit tests which revealed a few complexities that remain, such as
the way operators handle the `cancel()` call. All tests pass. The next step is
to run the full test suite as part of pre-commit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Remove STOP status in favor of fail-fast
> ----------------------------------------
>
> Key: DRILL-7583
> URL: https://issues.apache.org/jira/browse/DRILL-7583
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.17.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.18.0
>
>
> The original error solution was a complex process of a) setting a failed
> flag, b) telling all upstream operators they have failed, c) returning a
> {{STOP}} status. Drill has long supported a "fail-fast" error path based on
> throwing an exception; relying on the fragment executor to clean up the
> operator stack. Recent revisions have converted most operators to use the
> simpler fail-fast strategy based on throwing an exception instead of using
> the older {{STOP}} approach. This change simply removes the old, now-unused
> {{STOP}} based path.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)