[
https://issues.apache.org/jira/browse/IMPALA-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-8807.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.3.0
> OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
> ----------------------------------------------------------------
>
> Key: IMPALA-8807
> URL: https://issues.apache.org/jira/browse/IMPALA-8807
> Project: IMPALA
> Issue Type: Bug
> Components: Docs
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Major
> Labels: docs
> Fix For: Impala 3.3.0
>
>
> This came up here
> https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1&aid=1
> Our docs say
> {quote}
> This optimization does not apply if the queries contain any WHERE, GROUP BY,
> or HAVING clause. The relevant queries should only compute the minimum,
> maximum, or number of distinct values for the partition key columns across
> the whole table.
> {quote}
> This is false. Here's query illustrating it working with all three things:
> {noformat}
> [localhost:21000] default> set optimize_partition_key_scans=true; explain
> select max(ss_sold_date_sk) from tpcds_parquet.store_sales where
> ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk)
> > 1000;
> OPTIMIZE_PARTITION_KEY_SCANS set to true
> Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales
> where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having
> max(ss_sold_date_sk) > 1000
> +------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
> | Per-Host Resource Estimates: Memory=10MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 01:AGGREGATE [FINALIZE] |
> | | output: max(ss_sold_date_sk) |
> | | group by: ss_sold_date_sk |
> | | having: max(ss_sold_date_sk) > 1000 |
> | | row-size=8B cardinality=182 |
> | | |
> | 00:UNION |
> | constant-operands=182 |
> | row-size=4B cardinality=182 |
> +------------------------------------------------------------+
> Fetched 15 row(s) in 0.11s
> {noformat}
> We should reword this to be correct.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)