paul-rogers opened a new pull request, #13798:
URL: https://github.com/apache/druid/pull/13798

   Issue #13123 suggests an enhanced format for Druid exceptions. This PR 
introduces that new format for SQL error messages. This PR is meant as a POC to 
solicit feedback about the approach.
   
   The key abstraction is `DruidException` which holds an error type, an error 
message and a context.
   
   The error type is a broad category including:
   
   * `USER`: the cause is invalid user input. For SQL, it is likely something 
wrong in an SQL statement.
   * `CONFIG`: the cause is a botched configuration: some property is invalid, 
some promised file does not exist, etc. The fix is for the admin to fix the 
config: there is little the end user can do (unless the end user is also the 
admin.)
   * `SYSTEM`: the cause is a bug in Druid. Essentially, an assertion failed. 
The only user solution is to lobby us to fix our bug, or to find a workaround.
   * And so on. See `DruidException` for the full list.
   
   The context consist of key/value pairs with additional information.
   
   The context consists of key-value pairs. Information goes here when it would 
clutter the error message. For example: line numbers, file names, port numbers, 
resource limits, etc. By putting information in the context, consumers can 
control the amount of information exposed to end users. (Maybe a site does not 
want to expose internal property values, file names, etc.)
   
   The exception is converted to JSON for return to the user. The format is 
superset of the existing error JSON format. The error type provides a clear 
mapping to HTTP status code. Since Druid has multiple ways to present errors 
(REST, JDBC, tests, etc.), conversion from the `DruidException` structure to 
output format is done via "encoder" classes. The `DruidException` format is 
meant to be RPC-agnostic.
   
   The `DruidException` class is meant to communicate errors to the user. It 
should be thrown when the receiver of the exception is the REST RPC layer. 
There are times when code wants to throw a specific exception to some other 
part of the code. Those cases should continue to use specific error classes. 
For example, if a cache throws a "not found" exception, that cause something to 
download a file and insert it into the cache, that code should _not_ use a 
`DruidException` as this is not an event that terminates an RPC request.
   
   ### Examples
   
   Here's an example for a SQL parse error:
   
   ```json
   {
      "type":"USER",
      "message":"Parse error: unexpected token INSERT",
      "errorMessage":"Parse error: unexpected token INSERT\nLine: 1\nColumn: 
8\nExpected: \"UNION\", \"INTERSECT\", \"EXCEPT\", \"MINUS\", \"ORDER\", 
\"LIMIT\", \"OFFSET\", \"FETCH\", \"STREAM\", \"DISTINCT\", \"ALL\", \"*\", 
\"+\", \"-\", \"NOT\", \"EXISTS\", <UNSIGNED_INTEGER_LITERAL>, 
<DECIMAL_NUMERIC_LITERAL>, <APPROX_NUMERIC_LITERAL>, <BINARY_STRING_LITERAL>, 
<PREFIXED_STRING_LITERAL>, <QUOTED_STRING>, <UNICODE_STRING_LITERAL>, \"TRUE\", 
\"FALSE\", \"UNKNOWN\", \"NULL\", <LBRACE_D>, <LBRACE_T>, <LBRACE_TS>, 
\"DATE\", \"TIME\", \"TIMESTAMP\", \"INTERVAL\", \"?\", \"CAST\", \"EXTRACT\", 
\"POSITION\", \"CONVERT\", \"TRANSLATE\", \"OVERLAY\", \"FLOOR\", \"CEIL\", 
\"CEILING\", \"SUBSTRING\", \"TRIM\", \"CLASSIFIER\", \"MATCH_NUMBER\", 
\"RUNNING\", \"PREV\", \"NEXT\", \"JSON_EXISTS\", \"JSON_VALUE\", 
\"JSON_QUERY\", \"JSON_OBJECT\", \"JSON_OBJECTAGG\", \"JSON_ARRAY\", 
\"JSON_ARRAYAGG\", <LBRACE_FN>, \"MULTISET\", \"ARRAY\", \"PERIOD\", 
\"SPECIFIC\", <IDENTIFIER>, <QUOTED_IDENTIFIER>, 
 <BACK_QUOTED_IDENTIFIER>, <BRACKET_QUOTED_IDENTIFIER>, 
<UNICODE_QUOTED_IDENTIFIER>, \"ABS\", \"AVG\", \"CARDINALITY\", 
\"CHAR_LENGTH\", \"CHARACTER_LENGTH\", \"COALESCE\", \"COLLECT\", 
\"COVAR_POP\", \"COVAR_SAMP\", \"CUME_DIST\", \"COUNT\", \"CURRENT_DATE\", 
\"CURRENT_TIME\", \"CURRENT_TIMESTAMP\", \"DENSE_RANK\", \"ELEMENT\", \"EXP\", 
\"FIRST_VALUE\", \"FUSION\", \"GROUPING\", \"HOUR\", \"LAG\", \"LEAD\", 
\"LEFT\", \"LAST_VALUE\", \"LN\", \"LOCALTIME\", \"LOCALTIMESTAMP\", \"LOWER\", 
\"MAX\", \"MIN\", \"MINUTE\", \"MOD\", \"MONTH\", \"NTH_VALUE\", \"NTILE\", 
\"NULLIF\", \"OCTET_LENGTH\", \"PERCENT_RANK\", \"POWER\", \"RANK\", 
\"REGR_COUNT\", \"REGR_SXX\", \"REGR_SYY\", \"RIGHT\", \"ROW_NUMBER\", 
\"SECOND\", \"SQRT\", \"STDDEV_POP\", \"STDDEV_SAMP\", \"SUM\", \"UPPER\", 
\"TRUNCATE\", \"USER\", \"VAR_POP\", \"VAR_SAMP\", \"YEAR\", 
\"CURRENT_CATALOG\", \"CURRENT_DEFAULT_TRANSFORM_GROUP\", \"CURRENT_PATH\", 
\"CURRENT_ROLE\", \"CURRENT_SCHEMA\", \"CURRENT_USER\", \"SESSION_USER\", \"SY
 STEM_USER\", \"NEW\", \"CASE\", \"CURRENT\", \"CURSOR\", \"ROW\", \"(\" 
...\nError Code: SQL parse failed",
      "errorCode":"SQL parse failed",
      "context":{
         "Line":"1",
         "Expected":"\"UNION\", \"INTERSECT\", \"EXCEPT\", \"MINUS\", 
\"ORDER\", \"LIMIT\", \"OFFSET\", \"FETCH\", \"STREAM\", \"DISTINCT\", \"ALL\", 
\"*\", \"+\", \"-\", \"NOT\", \"EXISTS\", <UNSIGNED_INTEGER_LITERAL>, 
<DECIMAL_NUMERIC_LITERAL>, <APPROX_NUMERIC_LITERAL>, <BINARY_STRING_LITERAL>, 
<PREFIXED_STRING_LITERAL>, <QUOTED_STRING>, <UNICODE_STRING_LITERAL>, \"TRUE\", 
\"FALSE\", \"UNKNOWN\", \"NULL\", <LBRACE_D>, <LBRACE_T>, <LBRACE_TS>, 
\"DATE\", \"TIME\", \"TIMESTAMP\", \"INTERVAL\", \"?\", \"CAST\", \"EXTRACT\", 
\"POSITION\", \"CONVERT\", \"TRANSLATE\", \"OVERLAY\", \"FLOOR\", \"CEIL\", 
\"CEILING\", \"SUBSTRING\", \"TRIM\", \"CLASSIFIER\", \"MATCH_NUMBER\", 
\"RUNNING\", \"PREV\", \"NEXT\", \"JSON_EXISTS\", \"JSON_VALUE\", 
\"JSON_QUERY\", \"JSON_OBJECT\", \"JSON_OBJECTAGG\", \"JSON_ARRAY\", 
\"JSON_ARRAYAGG\", <LBRACE_FN>, \"MULTISET\", \"ARRAY\", \"PERIOD\", 
\"SPECIFIC\", <IDENTIFIER>, <QUOTED_IDENTIFIER>, <BACK_QUOTED_IDENTIFIER>, 
<BRACKET_QUOTED_IDENTIFIER>, <UNICODE_QUOTE
 D_IDENTIFIER>, \"ABS\", \"AVG\", \"CARDINALITY\", \"CHAR_LENGTH\", 
\"CHARACTER_LENGTH\", \"COALESCE\", \"COLLECT\", \"COVAR_POP\", \"COVAR_SAMP\", 
\"CUME_DIST\", \"COUNT\", \"CURRENT_DATE\", \"CURRENT_TIME\", 
\"CURRENT_TIMESTAMP\", \"DENSE_RANK\", \"ELEMENT\", \"EXP\", \"FIRST_VALUE\", 
\"FUSION\", \"GROUPING\", \"HOUR\", \"LAG\", \"LEAD\", \"LEFT\", 
\"LAST_VALUE\", \"LN\", \"LOCALTIME\", \"LOCALTIMESTAMP\", \"LOWER\", \"MAX\", 
\"MIN\", \"MINUTE\", \"MOD\", \"MONTH\", \"NTH_VALUE\", \"NTILE\", \"NULLIF\", 
\"OCTET_LENGTH\", \"PERCENT_RANK\", \"POWER\", \"RANK\", \"REGR_COUNT\", 
\"REGR_SXX\", \"REGR_SYY\", \"RIGHT\", \"ROW_NUMBER\", \"SECOND\", \"SQRT\", 
\"STDDEV_POP\", \"STDDEV_SAMP\", \"SUM\", \"UPPER\", \"TRUNCATE\", \"USER\", 
\"VAR_POP\", \"VAR_SAMP\", \"YEAR\", \"CURRENT_CATALOG\", 
\"CURRENT_DEFAULT_TRANSFORM_GROUP\", \"CURRENT_PATH\", \"CURRENT_ROLE\", 
\"CURRENT_SCHEMA\", \"CURRENT_USER\", \"SESSION_USER\", \"SYSTEM_USER\", 
\"NEW\", \"CASE\", \"CURRENT\", \"CURSOR\", \"ROW\", \"(
 \" ...",
         "Column":"8"
      }
   }
   ```
   
   A client should display the above message as:
   
   ```text
   Parse error: unexpected token INSERT
   Line: 1
   Column: 8
   Expected: "UNION", "INTERSECT", "EXCEPT", "MINUS", "ORDER", "LIMIT", 
"OFFSET", "FETCH", "STREAM", "DISTINCT", "ALL", "*", "+", "-", "NOT", "EXISTS", 
<UNSIGNED_INTEGER_LITERAL>, <DECIMAL_NUMERIC_LITERAL>, 
<APPROX_NUMERIC_LITERAL>, <BINARY_STRING_LITERAL>, <PREFIXED_STRING_LITERAL>, 
<QUOTED_STRING>, <UNICODE_STRING_LITERAL>, "TRUE", "FALSE", "UNKNOWN", "NULL", 
<LBRACE_D>, <LBRACE_T>, <LBRACE_TS>, "DATE", "TIME", "TIMESTAMP", "INTERVAL", 
"?", "CAST", "EXTRACT", "POSITION", "CONVERT", "TRANSLATE", "OVERLAY", "FLOOR", 
"CEIL", "CEILING", "SUBSTRING", "TRIM", "CLASSIFIER", "MATCH_NUMBER", 
"RUNNING", "PREV", "NEXT", "JSON_EXISTS", "JSON_VALUE", "JSON_QUERY", 
"JSON_OBJECT", "JSON_OBJECTAGG", "JSON_ARRAY", "JSON_ARRAYAGG", <LBRACE_FN>, 
"MULTISET", "ARRAY", "PERIOD", "SPECIFIC", <IDENTIFIER>, <QUOTED_IDENTIFIER>, 
<BACK_QUOTED_IDENTIFIER>, <BRACKET_QUOTED_IDENTIFIER>, 
<UNICODE_QUOTED_IDENTIFIER>, "ABS", "AVG", "CARDINALITY", "CHAR_LENGTH", 
"CHARACTER_LENGTH", "COALESCE", "COLLECT", "COVAR_POP
 ", "COVAR_SAMP", "CUME_DIST", "COUNT", "CURRENT_DATE", "CURRENT_TIME", 
"CURRENT_TIMESTAMP", "DENSE_RANK", "ELEMENT", "EXP", "FIRST_VALUE", "FUSION", 
"GROUPING", "HOUR", "LAG", "LEAD", "LEFT", "LAST_VALUE", "LN", "LOCALTIME", 
"LOCALTIMESTAMP", "LOWER", "MAX", "MIN", "MINUTE", "MOD", "MONTH", "NTH_VALUE", 
"NTILE", "NULLIF", "OCTET_LENGTH", "PERCENT_RANK", "POWER", "RANK", 
"REGR_COUNT", "REGR_SXX", "REGR_SYY", "RIGHT", "ROW_NUMBER", "SECOND", "SQRT", 
"STDDEV_POP", "STDDEV_SAMP", "SUM", "UPPER", "TRUNCATE", "USER", "VAR_POP", 
"VAR_SAMP", "YEAR", "CURRENT_CATALOG", "CURRENT_DEFAULT_TRANSFORM_GROUP", 
"CURRENT_PATH", "CURRENT_ROLE", "CURRENT_SCHEMA", "CURRENT_USER", 
"SESSION_USER", "SYSTEM_USER", "NEW", "CASE", "CURRENT", "CURSOR", "ROW", "(" 
...
   ```
   
   Here is an example for a Calcite validation error:
   
   ```json
   {
      "type":"USER",
      "message":"Column 'foo' not found in any table",
      "errorMessage":"Column 'foo' not found in any table\nLine: 1\nColumn: 
8\nError Code: SQL parse failed",
      "errorCode":"Plan validation failed",
      "context":{
         "Line":"1",
         "Column":"8"
      }
   }
   ```
   
   With the displayed result as:
   
   ```text
   Column 'foo' not found in any table
   Line: 1
   Column: 8
   ```
   
   ### Specific Changes
   
   All places in the SQL module that throw exceptions where modified to throw a 
`DruidException` instead. Some minor effort went into tidying up error 
messages: more could be done on this front.
   
   No exceptions in other modules were converted. Doing so is a big project 
that we'd consider only if we like how the SQL conversion works out. There are 
places in Druid where REST errors are inconsistent: the error message is HTML, 
or text, or an ad-hoc JSON format. Over time, we'd want to unify these to 
always return the JSON `DruidException` format. None of that work is reflected 
here.
   
   A `RestExceptionEncoder` interface provides an implementation to format a 
`DruidException` for return as a REST error object. The 
`StandardRestExceptionEncoder` includes all the information. A site-specific 
implementation could "sanitize" the error by, say, converting all `SYSTEM` 
errors to "Internal error. Please contact Support for assistance." Or, hide 
some of the context information, as noted above. For now, 
`StandardRestExceptionEncoder` is hard-coded. It would eventually be configured 
via Guice so an extension can replace it.
   
   A recent PR cleaned up execution errors for native queries. Eventually, that 
code should be revised to use `DruidException`, but that work has not yet been 
done. For now, `DruidException` is yet another exception that the 
`QueryResultPusher` has to handle.
   
   ### Release note
   
   This is a POC, so no release note yet. If we like where this is going, we'll 
want to inform users of the change to the REST error format.
   
   <hr>
   This PR has:
   
   - [X] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [X] a release note entry in the PR description.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to