[GitHub] [druid] jihoonson commented on issue #11165: [Feature Proposal] Improve error reporting for task and query failures

GitBox Tue, 04 May 2021 17:30:17 -0700


jihoonson commented on issue #11165:
URL: https://github.com/apache/druid/issues/11165#issuecomment-832341499

Hmm, it was a bit hard to understand what is the problem you want solve and
how your proposal can address it. If I understand correctly, the ultimate goal
is `Richer error messages detailing what went wrong and possible actions for
mitigation` for `Easier debugging and RCA (without looking at server logs)`.
And what you are proposing seems actually 2 things, 1) error propagation using
the error code and 2) a unified error handling and reporting system based on
1). The ultimate goal sounds reasonable if we are not doing it already, but
what it's unclear to me is how the unified system you are proposing achieves
the goal. To be more precise, the error propagation using the error code seems
enough to me to achieve the goal. So, here is my question around the point 2).

- From the user perspective, richer error message for easier debugging is
useful when the error is something that the user can work around by adjusting
some user settings.
- The Druid query system already has an error-code based error reporting
system (see [Query exception
class](https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/query/QueryException.java)).
In the query system, there are a few [user-facing
errors](http://druid.apache.org/docs/latest/querying/querying.html#query-errors)
and the current error reporting system has been working well. I _think_ it
already provides richer error message with a potential workaround.
- If we didn't have this query error reporting system, it would be worth
considering having a unified system that can be used everywhere. However, given
that we already have such a system for querying, what is the benefit of having
a unified system?
- If you are thinking of the registrable errors in custom extensions, I
would say we don't probably need it for query system. You can currently develop
[custom aggregators, complexMetrics, or query
types](http://druid.apache.org/docs/latest/development/modules.html#adding-aggregators).
Custom aggregators and complexMetrics will mostly need to handle
not-enough-resources style errors which are covered by the error types
currently we have. If you develop a new query type, it will require you to
develop a new query engine as well to process your custom query type. You can
freely add your custom error type in this case.

Based on this, I would suggest focusing on the ingestion side in this
proposal. What do you think?

Besides the comment on introducing a unified system, here are my comments on
other parts.

> - Specifying the severity of errors and other potential side effects
> - Hiding implementation and other sensitive details from the end user

1. Can you elaborate more on these? What is the problem of the current
status you think? And how does the proposal address the problem?

2. How would the proposed error-code-based system integrate with the current
ingestion system? Looking at the code snippets, it seems like we have to modify
all codes where throw exceptions to convert them to `DruidTypedException`? If
this is what you are proposing, it seems pretty error-prone because it's easy
to forget catching and converting exceptions especially when they are unchecked
exceptions. Also, please add a code snippet of `TaskStatus` after this change.
It will help to understand the proposal.

> Reference containing rich documentation for each error code, thus greatly
simplifying debugging

3. This is one of the advantages you listed. However, I'm not quite sure how
it can simplify debugging because the error code will not be exposed to users?
Can you add some more details on this?

4. Thinking about what the "error code" actually is, is it simply a key to
choose the proper error formatter when the error is exposed to users?

5. In `DruidTypedException`, what does "typed" mean?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on issue #11165: [Feature Proposal] Improve error reporting for task and query failures

Reply via email to