Krisztian Kasa created HIVE-22808:
-------------------------------------
Summary: HiveRelFieldTrimmer does not handle HiveTableFunctionScan
Key: HIVE-22808
URL: https://issues.apache.org/jira/browse/HIVE-22808
Project: Hive
Issue Type: Bug
Components: Query Planning
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
*Repro*
{code:java}
CREATE TABLE table_16 (
timestamp_col_19 timestamp,
timestamp_col_29 timestamp,
int_col_27 int,
int_col_39 int,
boolean_col_18 boolean,
varchar0045_col_23 varchar(45)
);
CREATE TABLE table_7 (
int_col_10 int,
bigint_col_3 bigint
);
CREATE TABLE table_10 (
boolean_col_8 boolean,
boolean_col_16 boolean,
timestamp_col_5 timestamp,
timestamp_col_15 timestamp,
timestamp_col_30 timestamp,
decimal3825_col_26 decimal(38, 25),
smallint_col_9 smallint,
int_col_18 int
);
explain cbo
SELECT
DISTINCT COALESCE(a4.timestamp_col_15, IF(a4.boolean_col_16,
a4.timestamp_col_30, a4.timestamp_col_5)) AS timestamp_col
FROM table_7 a3
RIGHT JOIN table_10 a4
WHERE (a3.bigint_col_3) >= (a4.int_col_18)
INTERSECT ALL
SELECT
COALESCE(LEAST(
COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
),
GREATEST(COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS
TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
) AS timestamp_col
FROM table_16 a1
GROUP BY COALESCE(LEAST(
COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
),
GREATEST(
COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)),
COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
);
{code}
CBO Plan contains unnecessary columns or all columns from a table in
projections like:
{code:java}
HiveProject(int_col_10=[$0], bigint_col_3=[$1],
BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3],
CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)])
{code}
*Cause*
The plan contains a HiveTableFunctionScan operator:
{code:java}
HiveTableFunctionScan(invocation=[replicate_rows($0, $1)],
rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)])
{code}
HiveTableFunctionScan is not handled by HiveRelFieldTrimmer nor RelFieldTrimmer
which suppose to remove unused columns in the
CalcitePlanner.applyPreJoinOrderingTransforms(...) phase. The whole subtree
rooted from HiveTableFunctionScan is ignored.
Whole plan:
{code:java}
CBO PLAN:
HiveProject($f0=[$1])
HiveTableFunctionScan(invocation=[replicate_rows($0, $1)],
rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)])
HiveProject($f0=[$2], $f1=[$0])
HiveFilter(condition=[=($1, 2)])
HiveAggregate(group=[{0}], agg#0=[count($1)], agg#1=[min($1)])
HiveProject($f0=[$0], $f1=[$1])
HiveUnion(all=[true])
HiveProject($f0=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count()])
HiveProject($f0=[$0])
HiveAggregate(group=[{0}])
HiveProject($f0=[CASE(IS NOT NULL($7), $7, if($5, $8,
$6))])
HiveJoin(condition=[>=($1, $13)], joinType=[inner],
algorithm=[none], cost=[not available])
HiveProject(int_col_10=[$0], bigint_col_3=[$1],
BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3],
CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)])
HiveFilter(condition=[IS NOT NULL($1)])
HiveTableScan(table=[[default, table_7]],
table:alias=[a3])
HiveProject(boolean_col_16=[$0],
timestamp_col_5=[$1], timestamp_col_15=[$2], timestamp_col_30=[$3],
int_col_18=[$4], BLOCK__OFFSET__INSIDE__FILE=[$5], INPUT__FILE__NAME=[$6],
ROW__ID=[$7], CAST=[CAST($4):BIGINT])
HiveFilter(condition=[IS NOT NULL(CAST($4):BIGINT)])
HiveTableScan(table=[[default, table_10]],
table:alias=[a4])
HiveProject($f0=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count()])
HiveProject($f0=[$0])
HiveAggregate(group=[{0}])
HiveProject($f0=[CASE(IS NOT NULL(least(CASE(IS NOT
NULL($0), $0, 2010-03-29 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1,
2014-08-16 00:00:00:TIMESTAMP(9)))), least(CASE(IS NOT NULL($0), $0, 2010-03-29
00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2014-08-16
00:00:00:TIMESTAMP(9))), greatest(CASE(IS NOT NULL($0), $0, 2013-07-01
00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2028-06-18
00:00:00:TIMESTAMP(9))))])
HiveTableScan(table=[[default, table_16]],
table:alias=[a1])
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)