Paul Jackson created CALCITE-2129:
-------------------------------------

             Summary: Aggregating function from subquery not projected properly
                 Key: CALCITE-2129
                 URL: https://issues.apache.org/jira/browse/CALCITE-2129
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.15.0
            Reporter: Paul Jackson
            Assignee: Julian Hyde


Starting with this DDL:
{code:SQL}
create table friends (
  id int not null,
  friend_id int,
  primary key (id),
  foreign key (friend_id) references friends (id)
);

insert into friends values
(210, null),
(209, 210),
(202, 209),
(208, 202),
(207, 209),
(203, 207),
(201, null),
(204, null),
(205, null),
(206, 209);
{code}

This query gives the highest id of friends that share friend_id for each friend:
{code:SQL}
SELECT id, friend_id,
 (SELECT max(f2.id)
  FROM friends AS f2
  WHERE f2.friend_id = f1.friend_id) AS foaf_id
FROM friends AS f1

id      friend  foaf
201             
202     209     207
203     207     203
204             
205             
206     209     207
207     209     207
208     202     208
209     210     209
210
{code}  

I convert this query to a RelNode and then converted it back to SQL:
{code:Java}
Planner aPlanner = Frameworks.getPlanner(aConfig);
SqlNode aQuery = aPlanner.parse(theSql);
aQuery = aPlanner.validate(aQuery);
RelNode aRelNode = aPlanner.rel(aQuery).project();
RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();
{code}

This gives the following plan and SQL:
{code}
LogicalProject(id=[$0], friend_id=[$1], foaf_id=[$2])
  LogicalProject(id=[$0], friend_id=[$1], EXPR$0=[$3])
    LogicalJoin(condition=[=($1, $2)], joinType=[left])
      JdbcTableScan(table=[[stardog, friends]])
      LogicalAggregate(group=[{0}], EXPR$0=[MAX($1)])
        LogicalProject(friend_id=[$1], id=[$0])
          LogicalProject(id=[$0], friend_id=[$1])
            LogicalFilter(condition=[IS NOT NULL($1)])
              JdbcTableScan(table=[[stardog, friends]])

SELECT `friends`.`id`, `friends`.`friend_id`, MAX(`id`)
FROM `stardog`.`friends`
LEFT JOIN (SELECT `friend_id`, MAX(`id`)
FROM `stardog`.`friends`
WHERE `friend_id` IS NOT NULL
GROUP BY `friend_id`) AS `t1` ON `friends`.`friend_id` = `t1`.`friend_id`
{code}

This is a bad conversion. The {{MAX(`id`)}} should not be repeated in the outer 
select. PostgreSQL will complain that the aggregating function requires a group 
by. MySQL returns the max id that has a non-null friend (208), that id's friend 
(202), and the max id of all rows (210):
{code}
id      friend  MAX(`id`)
208     202     210
{code}

I think the correct SQL should be:
{code:SQL}
SELECT `friends`.`id`, `friends`.`friend_id`, foaf_id
FROM `stardog`.`friends`
LEFT JOIN (SELECT `friend_id`, MAX(`id`) AS foaf_id
FROM `stardog`.`friends`
WHERE `friend_id` IS NOT NULL
GROUP BY `friend_id`) AS `t1` ON `friends`.`friend_id` = `t1`.`friend_id`
{code}

There is code in {{SqlImplementor}} and {{RelToSqlConverter}} that uses an 
{{ordinalMap}} to track what functions are aliases as what identifiers. When 
the SQL is generated, the identifier is replaced with the function. If all that 
code is removed, this works as expected (using {{EXPR$0}} as the alias rather 
than {{foaf_id}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to