[
https://issues.apache.org/jira/browse/CALCITE-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darpan Lunagariya (e6data computing) updated CALCITE-7533:
----------------------------------------------------------
Description:
The parser fails on queries where a WITH clause is followed by a parenthesized
query expression, even though this is permitted by {{SQL:2016 (<query
expression body> → ( <query expression body> ))}} and accepted by PostgreSQL.
Example failing query:
{code:java}
SELECT * FROM (
WITH q AS (SELECT 1 AS id)
(SELECT id FROM q) UNION ALL (SELECT id FROM q)
) t
{code}
Error:
{code:java}
Encountered "(" at line 3, column 3.
Was expecting one of:
"SELECT" ...
"TABLE" ...
"VALUE" ...
"VALUES" ...
"," ...
{code}
Root cause:
In core/src/main/codegen/templates/Parser.jj, the Query production only allows
LeafQuery (i.e. SELECT / VALUES / TABLE) as the body after a WithList:
{code:java}
[ withList = WithList() ]
e = LeafQuery(exprContext) { list.add(e); }
( AddSetOpQuery(list, exprContext) )*
{code}
This is asymmetric in two ways:
1. The parallel QueryOrExpr rule already uses LeafQueryOrExpr after WithList,
so a parenthesized query is allowed there.
2. AddSetOpQuery (used for the 2nd…Nth set-op operands) also uses
LeafQueryOrExpr, so the parser accepts … UNION ALL (SELECT 1) but rejects
(SELECT 1) UNION ALL … when it appears as the first operand after WITH.
Without WITH, the same shape parses fine because it goes through
ExprOrJoinOrOrderedQuery's alt-2 (TableRef1) path:
(SELECT 1) UNION ALL (SELECT 2) -- OK today
Proposed fix:
Branch Query so the post-WithList body uses `LeafQueryOrExpr` (mirroring
QueryOrExpr), while the no-WITH path keeps LeafQuery. This preserves Query's
first-set (SELECT, VALUE, VALUES, TABLE), so the LOOKAHEAD(2) decision in
ExprOrJoinOrOrderedQuery is unchanged and existing parses (e.g. (SELECT 1)
UNION ALL (SELECT 2)) take the same path and produce the same tree.
{code:java}
SqlNode Query(ExprContext exprContext) :
{
SqlNodeList withList = null;
final SqlNode e;
final List<Object> list = new ArrayList<Object>();
}
{
(
withList = WithList()
e = LeafQueryOrExpr(exprContext) { list.add(e); }
|
e = LeafQuery(exprContext) { list.add(e); }
)
( AddSetOpQuery(list, exprContext) )*
{ return addWith(withList, SqlParserUtil.toTree(list)); }
}
{code}
h3. Note on permissiveness:
Using LeafQueryOrExpr after WithList inherits the same looseness QueryOrExpr
already has: a non-query body such as WITH q AS (SELECT 1) 1+2 will now parse
successfully and then be rejected by the validator (rather than rejected at
parse time). This is intentional and consistent with Calcite's existing "parse
permissively, validate strictly" approach for WITH — QueryOrExpr exhibits the
same behavior today, so this isn't a new failure mode, just a new place it
surfaces. Tightening the grammar to reject this at parse time would duplicate
semantic checks that the validator already performs.
was:
The parser fails on queries where a WITH clause is followed by a parenthesized
query expression, even though this is permitted by {{SQL:2016 (<query
expression body> → ( <query expression body> ))}} and accepted by PostgreSQL.
Example failing query:
{code:java}
SELECT * FROM (
WITH q AS (SELECT 1 AS id)
(SELECT id FROM q) UNION ALL (SELECT id FROM q)
) t
{code}
Error:
{code:java}
Encountered "(" at line 3, column 3.
Was expecting one of:
"SELECT" ...
"TABLE" ...
"VALUE" ...
"VALUES" ...
"," ...
{code}
Root cause:
In core/src/main/codegen/templates/Parser.jj, the Query production only allows
LeafQuery (i.e. SELECT / VALUES / TABLE) as the body after a WithList:
{code:java}
[ withList = WithList() ]
e = LeafQuery(exprContext) { list.add(e); }
( AddSetOpQuery(list, exprContext) )*
{code}
This is asymmetric in two ways:
1. The parallel QueryOrExpr rule already uses LeafQueryOrExpr after WithList,
so a parenthesized query is allowed there.
2. AddSetOpQuery (used for the 2nd…Nth set-op operands) also uses
LeafQueryOrExpr, so the parser accepts … UNION ALL (SELECT 1) but rejects
(SELECT 1) UNION ALL … when it appears as the first operand after WITH.
Without WITH, the same shape parses fine because it goes through
ExprOrJoinOrOrderedQuery's alt-2 (TableRef1) path:
(SELECT 1) UNION ALL (SELECT 2) -- OK today
Proposed fix:
Branch Query so the post-WithList body uses `LeafQueryOrExpr` (mirroring
QueryOrExpr), while the no-WITH path keeps LeafQuery. This preserves Query's
first-set (SELECT, VALUE, VALUES, TABLE), so the LOOKAHEAD(2) decision in
ExprOrJoinOrOrderedQuery is unchanged and existing parses (e.g. (SELECT 1)
UNION ALL (SELECT 2)) take the same path and produce the same tree.
{code:java}
SqlNode Query(ExprContext exprContext) :
{
SqlNodeList withList = null;
final SqlNode e;
final List<Object> list = new ArrayList<Object>();
}
{
(
withList = WithList()
e = LeafQueryOrExpr(exprContext) { list.add(e); }
|
e = LeafQuery(exprContext) { list.add(e); }
)
( AddSetOpQuery(list, exprContext) )*
{ return addWith(withList, SqlParserUtil.toTree(list)); }
}
{code}
> Parser rejects parenthesized query as the body of a WITH clause
> ---------------------------------------------------------------
>
> Key: CALCITE-7533
> URL: https://issues.apache.org/jira/browse/CALCITE-7533
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Darpan Lunagariya (e6data computing)
> Priority: Minor
>
> The parser fails on queries where a WITH clause is followed by a
> parenthesized query expression, even though this is permitted by {{SQL:2016
> (<query expression body> → ( <query expression body> ))}} and accepted by
> PostgreSQL.
> Example failing query:
> {code:java}
> SELECT * FROM (
> WITH q AS (SELECT 1 AS id)
> (SELECT id FROM q) UNION ALL (SELECT id FROM q)
> ) t
> {code}
> Error:
> {code:java}
> Encountered "(" at line 3, column 3.
> Was expecting one of:
> "SELECT" ...
> "TABLE" ...
> "VALUE" ...
> "VALUES" ...
> "," ...
> {code}
> Root cause:
> In core/src/main/codegen/templates/Parser.jj, the Query production only
> allows LeafQuery (i.e. SELECT / VALUES / TABLE) as the body after a WithList:
> {code:java}
> [ withList = WithList() ]
> e = LeafQuery(exprContext) { list.add(e); }
> ( AddSetOpQuery(list, exprContext) )*
> {code}
> This is asymmetric in two ways:
> 1. The parallel QueryOrExpr rule already uses LeafQueryOrExpr after WithList,
> so a parenthesized query is allowed there.
> 2. AddSetOpQuery (used for the 2nd…Nth set-op operands) also uses
> LeafQueryOrExpr, so the parser accepts … UNION ALL (SELECT 1) but rejects
> (SELECT 1) UNION ALL … when it appears as the first operand after WITH.
> Without WITH, the same shape parses fine because it goes through
> ExprOrJoinOrOrderedQuery's alt-2 (TableRef1) path:
> (SELECT 1) UNION ALL (SELECT 2) -- OK today
> Proposed fix:
> Branch Query so the post-WithList body uses `LeafQueryOrExpr` (mirroring
> QueryOrExpr), while the no-WITH path keeps LeafQuery. This preserves Query's
> first-set (SELECT, VALUE, VALUES, TABLE), so the LOOKAHEAD(2) decision in
> ExprOrJoinOrOrderedQuery is unchanged and existing parses (e.g. (SELECT 1)
> UNION ALL (SELECT 2)) take the same path and produce the same tree.
> {code:java}
> SqlNode Query(ExprContext exprContext) :
> {
> SqlNodeList withList = null;
> final SqlNode e;
> final List<Object> list = new ArrayList<Object>();
> }
> {
> (
> withList = WithList()
> e = LeafQueryOrExpr(exprContext) { list.add(e); }
> |
> e = LeafQuery(exprContext) { list.add(e); }
> )
> ( AddSetOpQuery(list, exprContext) )*
> { return addWith(withList, SqlParserUtil.toTree(list)); }
> }
> {code}
> h3. Note on permissiveness:
>
> Using LeafQueryOrExpr after WithList inherits the same looseness QueryOrExpr
> already has: a non-query body such as WITH q AS (SELECT 1) 1+2 will now parse
> successfully and then be rejected by the validator (rather than rejected at
> parse time). This is intentional and consistent with Calcite's existing
> "parse permissively, validate strictly" approach for WITH — QueryOrExpr
> exhibits the same behavior today, so this isn't a new failure mode, just a
> new place it surfaces. Tightening the grammar to reject this at parse time
> would duplicate semantic checks that the validator already performs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)