Hello everyone. My name is Bolek and I work at StarTree. Recently I've been trying to improve error reporting of Apache Pinot's sql parser, which is a slightly customized version of Calcite's babel parser. I noticed that in many cases (probably most) the error information (position, token and list of tokens) is wrong. For instance, sql command such as:
WITH grouping AS (SELECT 1) select * from grouping; produces the following useless message: org.apache.pinot.sql.parsers.SqlCompilationException: Caught exception while parsing query: WITH grouping AS (SELECT 1) select * from grouping Caused by: org.apache.pinot.sql.parsers.parser.ParseException: Encountered "" at line 1, column 1. Was expecting one of: (empty) Oftentimes, the reported error position and token are 1 token earlier than they should be and the list of expected productions is hundreds of items-long. The issue seems to be caused by using global lookahead value 2 while having a long list of nonReservedKeywords. Once I switched lookahead to 1, javacc maven plugin started to emit a large number of conflicts between regular productions and nonReservedKeywords. I managed to fix those by adding lots of LOOKAHEADS to the grammar (as can be seen at https://github.com/apache/pinot/pull/14238/files#diff-5de5043229de15ff630c4920d392a058098fa3f54793df4799734c0a4f908732 ) but that makes the grammar harder to keep in sync with Calcite's . Has anyone worked on a similar issue or could suggest a better approach ? If the approach makes sense, would Calcite be open to similar change to the grammar ? Please let me know if there's a better place for discussing such issues. Best regards, Bolek Ziobrowski