[
https://issues.apache.org/jira/browse/TAJO-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyunsik Choi reassigned TAJO-1600:
----------------------------------
Assignee: Hyunsik Choi
> Invalid query planning for distinct group-by
> --------------------------------------------
>
> Key: TAJO-1600
> URL: https://issues.apache.org/jira/browse/TAJO-1600
> Project: Tajo
> Issue Type: Bug
> Components: Planner/Optimizer
> Reporter: Jihoon Son
> Assignee: Hyunsik Choi
> Priority: Critical
> Fix For: 0.11.0
>
>
> For a query involving distinct operator, group-by is always executed at the
> last step of the query. Let me consider an example query as follows.
> {noformat}
> default> select distinct a.col3 from test as a left outer join lineitem b on
> a.col1 = b.l_orderkey order by a.col3;
> {noformat}
> The plan for this query is
> {noformat}
> GROUP_BY(5)(col3)
> => target list: default.a.col3 (TEXT)
> => out schema:{(1) default.a.col3 (TEXT)}
> => in schema:{(1) default.a.col3 (TEXT)}
> SORT(3)
> => Sort Keys: default.a.col3 (TEXT) (asc)
> JOIN(7)(LEFT_OUTER)
> => Join Cond: default.a.col1 (INT4) = default.b.l_orderkey (INT4)
> => target list: default.a.col3 (TEXT)
> => out schema: {(1) default.a.col3 (TEXT)}
> => in schema: {(3) default.a.col3 (TEXT), default.a.col1 (INT4),
> default.b.l_orderkey (INT4)}
> SCAN(1) on default.lineitem_large as b
> => target list: default.b.l_orderkey (INT4)
> => out schema: {(1) default.b.l_orderkey (INT4)}
> => in schema: {(16) default.b.l_orderkey (INT4),
> default.b.l_partkey (INT4), default.b.l_suppkey (INT4),
> default.b.l_linenumber (INT4), default.b.l_quantity (FLOAT8),
> default.b.l_extendedprice (FLOAT8), default.b.l_discount (FLOAT8),
> default.b.l_tax (FLOAT8), default.b.l_returnflag (TEXT),
> default.b.l_linestatus (TEXT), default.b.l_shipdate (TEXT),
> default.b.l_commitdate (TEXT), default.b.l_receiptdate (TEXT),
> default.b.l_shipinstruct (TEXT), default.b.l_shipmode (TEXT),
> default.b.l_comment (TEXT)}
> PARTITIONS_SCAN(8) on default.testbroadcastmulticolumnpartitiontable
> as a
> => target list: default.a.col3 (TEXT), default.a.col1 (INT4)
> => num of filtered paths: 3
> => out schema: {(2) default.a.col3 (TEXT), default.a.col1 (INT4)}
> => in schema: {(2) default.a.col1 (INT4), default.a.col2 (FLOAT4)}
> => 0:
> hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=01/col4=1996
> => 1:
> hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=10/col4=1993
> => 2:
> hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=12/col4=1996
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)