[
https://issues.apache.org/jira/browse/IMPALA-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-2875:
----------------------------------
Priority: Major (was: Critical)
> Optimize subplans when the following plan nodes do not require parent rows.
> ---------------------------------------------------------------------------
>
> Key: IMPALA-2875
> URL: https://issues.apache.org/jira/browse/IMPALA-2875
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 2.3.0
> Reporter: Alexander Behm
> Priority: Major
> Labels: nested_types, performance, planner
>
> Consider the following query that references nested collections and its plan:
> Query:
> {code}
> select count(*) from tpch_nested_parquet.customer c, c.c_orders.o_lineitems l
> where c.c_mktsegment = "AUTOMOBILE"
> group by l.l_returnflag
> {code}
> Plan:
> {code}
> +------------------------------------------------------------------------------------+
> | Explain String
> |
> +------------------------------------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=304.00MB VCores=2
> |
> | WARNING: The following tables are missing relevant table and/or column
> statistics. |
> | tpch_nested_parquet.customer
> |
> |
> |
> | 08:EXCHANGE [UNPARTITIONED]
> |
> | |
> |
> | 07:AGGREGATE [FINALIZE]
> |
> | | output: count:merge(*)
> |
> | | group by: l.l_returnflag
> |
> | |
> |
> | 06:EXCHANGE [HASH(l.l_returnflag)]
> |
> | |
> |
> | 05:AGGREGATE
> |
> | | output: count(*)
> |
> | | group by: l.l_returnflag
> |
> | |
> |
> | 01:SUBPLAN
> |
> | |
> |
> | |--04:NESTED LOOP JOIN [CROSS JOIN]
> |
> | | |
> |
> | | |--02:SINGULAR ROW SRC
> |
> | | |
> |
> | | 03:UNNEST [c.c_orders.o_lineitems l]
> |
> | |
> |
> | 00:SCAN HDFS [tpch_nested_parquet.customer c]
> |
> | partitions=1/1 files=4 size=554.13MB
> |
> | predicates: c.c_mktsegment = 'AUTOMOBILE'
> |
> +------------------------------------------------------------------------------------+
> {code}
> In execution, we spend a lot of time evaluating and resetting the nested-loop
> join.
> However, for this query the plan after the subplan node does not need the
> parent rows at all, so we could improve this query by only having an unnest
> node inside the subplan.
> This optimization is a special case of projection trimming.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]