morrySnow opened a new pull request, #63225:
URL: https://github.com/apache/doris/pull/63225
## Summary
When a user writes a statement like `CREATE DATABASE load`, Doris previously
produced an overwhelming ANTLR-generated error listing hundreds of expected
tokens:
```
mismatched input 'load' expecting {'{', '}', 'ACTIONS', 'AFTER',
'AGG_STATE', 'AGGREGATE', ...hundreds of tokens...}(line 1, pos 16)
```
This message is cryptic and gives no actionable guidance.
## Root Cause
`LOAD` is a reserved keyword, so the parser expects an identifier but
receives a keyword token. ANTLR then generates a huge "expecting {all
non-reserved keywords}" list which is useless to the user.
## Research: Other Databases
| Database | Error Message |
|---|---|
| **BigQuery** | `Syntax error: Unexpected keyword LOAD at [1:17]` — names
the keyword explicitly |
| **PostgreSQL/DuckDB** | `syntax error at or near "load"` — short and
concise |
| **Spark SQL** | Suggests backtick quoting for keyword-as-identifier |
| **Trino** | Same verbose ANTLR output (same problem) |
## Fix
Improved `ParseErrorListener` to:
1. **Detect reserved-keyword-as-identifier errors**: When
`InputMismatchException` fires with expected tokens containing
`IDENTIFIER`/`BACKQUOTED_IDENTIFIER`, and the offending token has a grammar
literal name AND looks like a word (not punctuation like `;`), emit a targeted
message
2. **New message format** (inspired by BigQuery + Spark):
```
Syntax error near 'load': 'load' is a reserved keyword and cannot be used
as an identifier without quoting.
If you want to use 'load' as an identifier, please use backtick quotes:
`load`
(line 1, pos 16)
```
3. **Trim long expected-token lists**: For other mismatch errors where the
expected-token list exceeds 200 chars, strip the list to avoid overwhelming
users
4. **pom.xml**: Added default `<argLine/>` property so Maven Surefire can
run tests without the JaCoCo coverage profile
## Testing
Added `NereidsParserTest#testReservedKeywordAsIdentifierError`:
- Verifies `CREATE DATABASE load` produces "reserved keyword" message with
backtick hint
- Verifies `CREATE DATABASE select` likewise
- Verifies `CREATE DATABASE \`load\`` still parses successfully
Existing `testErrorListener` passes unchanged (its short expected-token list
is under the 200-char trim threshold).
### Check List (For Author)
- Test: Unit Test — added
`NereidsParserTest#testReservedKeywordAsIdentifierError`
- Behavior changed: Yes — parse errors for reserved-keyword-as-identifier
show a human-friendly message instead of raw ANTLR output
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]