[
https://issues.apache.org/jira/browse/CALCITE-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Talbot updated CALCITE-3593:
-----------------------------------
Description:
Best shown with the shell of a test in RelToSqlConverter test.
The following SQL on BigQuery
{code:java}
select product_id - 1000 as product_id
from (
select product_id, avg(gross_weight) as agw
from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
where net_weight < 100
group by product_id having product_id > 0){code}
produces one result, because the having filter applies to the product id before
subtraction, of course.
Running it through the machinery in that test
(`sql(query).withBigQuery().ok(expected)`) translates it to:
{noformat}
SELECT product_id - 1000 AS product_id
from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
WHERE net_weight < 100
GROUP BY product_id
HAVING product_id > 0{noformat}
This changes the meaning of the query: now the HAVING is on the
after-subtraction product_id and you get no results, rather than the one result.
Note that this is _not_ true in HyperSQL, as it has different semantics around
the HAVING namespace.
{noformat}
select "product_id" - 1000 as "product_id"
from (
select "product_id", avg("gross_weight") as agw
from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM
(VALUES(0))) as product
where "net_weight" < 100
group by "product_id" having "product_id" > 0){noformat}
becomes
{noformat}
SELECT "product_id" - 1000 AS "product_id"
from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM
(VALUES(0))) as product
WHERE "net_weight" < 100
GROUP BY "product_id"
HAVING "product_id" > 0{noformat}
But the meaning is preserved: both return a row.
I'm not enough of a SQL standards expert to know which one is being more
compliant, but presumably both would have to be supported via some sort of flag?
I think the fix would be to force the subselect on dialects such as BigQuery
that have this behavior. Probably something that looks a lot like
[https://github.com/apache/calcite/blob/3530daaa8cad43aad6845b6c79e4bc1ca0e72f5f/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1043-L1047]
was:
Best shown with the shell of a test in RelToSqlConverter test.
The following SQL on BigQuery
{code:java}
select product_id - 1000 as product_id
from (
select product_id, avg(gross_weight) as agw
from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
where net_weight < 100
group by product_id having product_id > 0){code}
produces one result, because the having filter applies to the product id before
subtraction, of course.
Running it through the machinery in that test translates it to:
{noformat}
SELECT product_id - 1000 AS product_id
from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
WHERE net_weight < 100
GROUP BY product_id
HAVING product_id > 0{noformat}
This changes the meaning of the query: now the HAVING is on the
after-subtraction product_id and you get no results, rather than the one result.
Note that this is _not_ true in HyperSQL, as it has different semantics around
the HAVING namespace.
{noformat}
select "product_id" - 1000 as "product_id"
from (
select "product_id", avg("gross_weight") as agw
from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM
(VALUES(0))) as product
where "net_weight" < 100
group by "product_id" having "product_id" > 0){noformat}
becomes
{noformat}
SELECT "product_id" - 1000 AS "product_id"
from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM
(VALUES(0))) as product
WHERE "net_weight" < 100
GROUP BY "product_id"
HAVING "product_id" > 0{noformat}
But the meaning is preserved: both return a row.
I'm not enough of a SQL standards expert to know which one is being more
compliant, but presumably both would have to be supported via some sort of flag?
I think the fix would be to force the subselect on dialects such as BigQuery
that have this behavior. Probably something that looks a lot like
[https://github.com/apache/calcite/blob/3530daaa8cad43aad6845b6c79e4bc1ca0e72f5f/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1043-L1047]
> RelToSqlConverter changes target of ambiguous having clause with a Project on
> top of an Aggregate
> -------------------------------------------------------------------------------------------------
>
> Key: CALCITE-3593
> URL: https://issues.apache.org/jira/browse/CALCITE-3593
> Project: Calcite
> Issue Type: Bug
> Reporter: Steven Talbot
> Priority: Major
>
> Best shown with the shell of a test in RelToSqlConverter test.
> The following SQL on BigQuery
> {code:java}
> select product_id - 1000 as product_id
> from (
> select product_id, avg(gross_weight) as agw
> from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as
> product
> where net_weight < 100
> group by product_id having product_id > 0){code}
> produces one result, because the having filter applies to the product id
> before subtraction, of course.
> Running it through the machinery in that test
> (`sql(query).withBigQuery().ok(expected)`) translates it to:
> {noformat}
> SELECT product_id - 1000 AS product_id
> from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as
> product
> WHERE net_weight < 100
> GROUP BY product_id
> HAVING product_id > 0{noformat}
> This changes the meaning of the query: now the HAVING is on the
> after-subtraction product_id and you get no results, rather than the one
> result.
> Note that this is _not_ true in HyperSQL, as it has different semantics
> around the HAVING namespace.
> {noformat}
> select "product_id" - 1000 as "product_id"
> from (
> select "product_id", avg("gross_weight") as agw
> from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight"
> FROM (VALUES(0))) as product
> where "net_weight" < 100
> group by "product_id" having "product_id" > 0){noformat}
> becomes
> {noformat}
> SELECT "product_id" - 1000 AS "product_id"
> from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight"
> FROM (VALUES(0))) as product
> WHERE "net_weight" < 100
> GROUP BY "product_id"
> HAVING "product_id" > 0{noformat}
> But the meaning is preserved: both return a row.
> I'm not enough of a SQL standards expert to know which one is being more
> compliant, but presumably both would have to be supported via some sort of
> flag?
> I think the fix would be to force the subselect on dialects such as BigQuery
> that have this behavior. Probably something that looks a lot like
> [https://github.com/apache/calcite/blob/3530daaa8cad43aad6845b6c79e4bc1ca0e72f5f/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1043-L1047]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)