Hello everyone.
My name is Bolek and I work at StarTree.
Recently I've been trying to improve error reporting of Apache Pinot's sql
parser, which is a slightly customized version of Calcite's babel parser.
I noticed that in many cases (probably most) the error information
(position, token and list of tokens) is wrong.
For instance, sql command such as:

WITH grouping AS (SELECT 1) select * from grouping;

produces  the following useless message:
org.apache.pinot.sql.parsers.SqlCompilationException:
 Caught exception while parsing query: WITH grouping AS (SELECT 1) select *
from grouping
Caused by: org.apache.pinot.sql.parsers.parser.ParseException:
 Encountered "" at line 1, column 1. Was expecting one of: (empty)

Oftentimes, the reported error position and token are 1 token earlier than
they should be and the list of expected productions is hundreds of
items-long.

The issue seems to be caused by using global lookahead value 2 while having
a long list of nonReservedKeywords.
Once I switched lookahead to 1, javacc maven plugin started to emit a large
number of conflicts between regular productions and nonReservedKeywords.
I managed to fix those by adding lots of LOOKAHEADS to the grammar (as can
be seen at
https://github.com/apache/pinot/pull/14238/files#diff-5de5043229de15ff630c4920d392a058098fa3f54793df4799734c0a4f908732
)
but that makes the grammar harder to keep in sync with Calcite's .
Has anyone worked on a similar issue or could suggest a better approach ?
If the approach makes sense, would Calcite be open to similar change to the
grammar ?
Please let me know if there's a better place for discussing such issues.

Best regards,
Bolek Ziobrowski

Reply via email to