[ https://issues.apache.org/jira/browse/CALCITE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320958#comment-16320958 ]
Julian Hyde commented on CALCITE-2130: -------------------------------------- I see the problem. The decorrelated query is evaluating the sub-query for all values of {{id}}, not just '203', and one of those values gives an error. > Converting subquery to join is not always giving equivalent behavior > -------------------------------------------------------------------- > > Key: CALCITE-2130 > URL: https://issues.apache.org/jira/browse/CALCITE-2130 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.15.0 > Reporter: Paul Jackson > Assignee: Julian Hyde > > Starting with this DDL: > {code:SQL} > create table friends ( > id int not null, > friend_id int, > primary key (id), > foreign key (friend_id) references friends (id) > ); > insert into friends values > (210, null), > (209, 210), > (202, 209), > (208, 202), > (207, 209), > (203, 207), > (201, null), > (204, null), > (205, null), > (206, 209); > {code} > This query looks up a row by id and performs another lookup for id by the > friend id: > {code:SQL} > SELECT id, > (SELECT id > FROM friends as f2 > WHERE f2.friend_id = f1.friend_id) AS friend > FROM friends as f1 > WHERE id = '203' > id friend_id friend > 203 207 203 > {code} > This query only returns a result for rows with unique values for friend_id. > Replacing 203 with 202 leads to this error (MySql): {{Error Code: 1242. > Subquery returns more than 1 row}} > I converted this query to a RelNode and then converted it back to SQL using > the MySQL dialect: > {code:Java} > Planner aPlanner = Frameworks.getPlanner(aConfig); > SqlNode aQuery = aPlanner.parse(theSql); > aQuery = aPlanner.validate(aQuery); > RelNode aRelNode = aPlanner.rel(aQuery).project(); > RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect); > SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement(); > {code} > This gives the following plan and SQL: > {code} > LogicalProject(id=[$0], friend_id=[$1], friend=[$2]) > LogicalProject(id=[$0], friend_id=[$1], $f0=[$3]) > LogicalJoin(condition=[=($1, $2)], joinType=[left]) > LogicalFilter(condition=[=($0, '203')]) > JdbcTableScan(table=[[stardog, friends]]) > LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)]) > LogicalProject(friend_id=[$1], id=[$0]) > LogicalProject(id=[$0], friend_id=[$1]) > LogicalFilter(condition=[IS NOT NULL($1)]) > JdbcTableScan(table=[[stardog, friends]]) > SELECT `t`.`id`, `t`.`friend_id`, `t2`.`$f1` AS `$f0` > FROM (SELECT * > FROM `stardog`.`friends` > WHERE `id` = '203') AS `t` > LEFT JOIN (SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN > `id` ELSE (SELECT NULL > UNION ALL > SELECT NULL) END AS `$f1` > FROM `stardog`.`friends` > WHERE `friend_id` IS NOT NULL > GROUP BY `friend_id`) AS `t2` ON `t`.`friend_id` = `t2`.`friend_id` > {code} > The MySQL implementation for SINGLE_VALUE is the CASE clause that causes an > 1242 error when the id count is greater than 1 by invoking a UNION ALL on two > NULL rows. In theory, this should return the ID when it is a unique value and > throw an error when there are multiple. Instead, MySQL will return the 1242 > error for all values of id, including 203. > Note, the JOIN subquery works if you add a WHERE clause expression to > constrain the value of freind_id: > {code:SQL} > SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE > (SELECT NULL > UNION ALL > SELECT NULL) END AS `$f1` > FROM `stardog`.`friends` > WHERE `friend_id` IS NOT NULL AND `friend_id` = '207' > GROUP BY `friend_id` > {code} > Substituting friend_id for 207 leads to the 1242 error, as intended. > This JOIN works on some dialects, but I think it is because different > dialects can use different join implementations. If the join performs the > JOIN ON SELECT clause without adding a where clause expression to constrain > friend_id (collecting all rows with a non-null friend_id), it will encounter > this error. Implementations that use a join algorithm that does constrain the > friend_id to the desired value will experience the error only when there are > multiple rows with the same friend_id, which behaves like the original query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)