paul-rogers opened a new issue, #13123:
URL: https://github.com/apache/druid/issues/13123
### Description
Proposed is a standardized, more robust form for Druid error messages to
upgrade the somewhat ad-hoc approach employed today. The design focuses on the
needs of the end user: what went wrong and how can they fix it. The design
allows incremental upgrade from the existing code.
This is a first-draft proposal to explore the basic idea of a revised error
mechanism. Comments and suggestions encouraged!
### Motivation
Druid uses Java exceptions for many different purposes
* The user provided input which is not valid. (An invalid native query, or
invalid SQL query, say.)
* The user configured the system incorrectly. (A required directory doesn’t
actually exist.)
* Capacity limits (out of memory, reached capacity limits)
* Detecting “should not occur” conditions during development
At present, Druid uses the same form of exception for all these cases.
Exceptions tend to have very simple messages which focus on the immediate
condition such as “invalid syntax” or “no more slots”. As with all Java
applications, developers rely on the stack trace to identify the context in
which the error occurs.
Users, however, find Java stack traces unhelpful. Within the last year or
two, Druid did work to sanitize errors, stripping off the stack trace and
removing unhelpful messages. Unfortulately, the result is often an error of the
form “something went wrong”, which does turn out to be helpful to users.
What users want is to know:
* What specific item went wrong?
* What do I need to do to fix the error?
That information cannot be inferred from the current error messages: it is
something that we have to think through and add to the error at the point it is
thrown.
This proposal discusses how we do this. This design is based on on
successfully used in Apache Drill.
### Goals
We propose to acknowledge that Druid exceptions serve two entirely different
needs: help a developer debug an issue with the code, and help a production
user diagnose problems related to their usage of the system. Any given
exception may serve both needs, though there will be many more that are
primarily for developers.
### Proposal
We propose to create a new exception class, say, `DruidException`, which
captures exceptions that are likely to be presented to the user. (Those
exceptions which are primarily internal can continue to use the existing
assortment of exception classes.)
A `DruidException`:
* Provides a clear description of the situation *from the user’s
perspective*.
* Provides context (such as the specific node, data source, column or other
item) that has a problem.
* Has an error “kind”: user error, config error, capacity limit, internal
error.
* Is logged for use in diagnosing problems after the fact
* Is presented to the user as nicely-formatted text without a stack trace
* May include an error code to map to SQL and JDBC error codes.
* May include the HTTP response code for the error.
### Style Guide
Error messages presented to users are, essentially, just-in-time
documentation. As such we’d want to craft user-visible error messages
carefully. To help with this, we should create a simple style guide that
explains what a good error message looks like, and common wording.
Older systems used a “message catalog” to separate the text of errors from
code. However, more recent Java applications just put the messages directly in
the code. Either is fine as long as we can ensure the messages are clear and
to-the-point.
### Details
Error messages are part of the Druid REST API. The proposed `DruidException`
must map into the existing API to avoid breaking compatibility. The existing
mechanism is based on a `SanitizableException` class that generates three
fields in a REST response:
* `error` - An error “code” such as "SQL query is unsupported"
* `errorMessage` - The message from the original exception, which is often
unclear.
* `errorClass` - ?
Unfortunately, these categories are not well defined or consistently used.
(That is one of the issues that lead to this proposal.) In practice, the fields
tend to be used inconsistent, and are often null.
We can refine the REST error response as follows:
* `error` - Brief summary of the error
* `errorMessage` - Multi-line, detailed description, including context
* `errorClass` - One of the major categories: user error, configuration
error, etc.
* `errorCode` - Optional in the case that we assign specific numeric or
string error codes, such as standard SQL error codes.
An example might be:
```json
{
“error”: {
“error”: “Invalid SQL”,
“errorMessage”: “The table ‘foo’ is not defined in schema ‘druid’.
Line 4, column 23”,
“errorClass”: “user”,
“sqlError”: “HV00R”
}
}
```
A `DruidException` would bypass the current sanitization logic and would
instead be translated to JSON using the new rules defined here, such as the
example format shown above.
### Example Code
The following is a crude straw-man example of how the new mechanism may
appear in code. The idea is to define a builder that gathers information about
the error:
```
throw DruidException
.userError()
.error(DruidException.INVALID_SQL)
.message(“The table ‘%s’ is not defined in schema ‘%s’.", tableName,
schemaName)
.context("Line %d, column %d", node.lineNo, node.columnNo)
.build();
```
The `build()` call creates the exception and logs it. Log level is
determined the the error kind. Here, we raise a user exception, which needs
only, say, `info` logging level. An `internalError()` might be logged at the
`error` level. The builder can offer an exception, a `logLevel()` method that
lets the developer override the general rules.
### Roll-out Plan
This proposal requires two distinct parts:
1. Design the `DruidException` mechanism itself.
2. Incrementally upgrade existing exceptions based on priority (frequency of
occurrence.)
Of course, the process is iterative: converting the first few exceptions
will certainly identify enhancements we’d like in the exception mechanism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]