[
https://issues.apache.org/jira/browse/IMPALA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Rahn updated IMPALA-1728:
------------------------------
Labels: performance planner tpc-ds (was: TPC-DS performance planner)
> sub-query with duplicate values used IN conditional operator should discard
> the duplicate values before applying the operator
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-1728
> URL: https://issues.apache.org/jira/browse/IMPALA-1728
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Affects Versions: Impala 2.0, Impala 2.1
> Reporter: Dileep Kumar
> Priority: Minor
> Labels: performance, planner, tpc-ds
> Attachments: q95.sql, q95.sql.DISTINCT
>
>
> When running the TPC-DS Q95 we found that it usages a result of CTE in IN
> conditional later in query.
> In this case CTE generates too many duplicate values for the same column
> which is used in conditional. When applied the DISTINCT to CTE it took 40%
> less time to complete.
> The timings(in Sec.) are as:
> Without DISTINCT : 1240
> With DISTINCT : 728
> Both versions of the query are attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]