[ 
https://issues.apache.org/jira/browse/IMPALA-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895751#comment-16895751
 ] 

ASF subversion and git services commented on IMPALA-8807:
---------------------------------------------------------

Commit b6b45c06656276edc90928c0bbb95c93e4a04f6f in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b6b45c0 ]

IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs

The docs were inaccurate about the cases in which the optimisation
applied. Happily, it actually works in a much wider set of cases.

Change-Id: I8909b23bfe2b90470fc559fbc01f1e3aa3caa85d
Reviewed-on: http://gerrit.cloudera.org:8080/13949
Reviewed-by: Alex Rodoni <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
> ----------------------------------------------------------------
>
>                 Key: IMPALA-8807
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8807
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Docs
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: docs
>             Fix For: Impala 3.3.0
>
>
> This came up here 
> https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1&aid=1
> Our docs say
> {quote}
> This optimization does not apply if the queries contain any WHERE, GROUP BY, 
> or HAVING clause. The relevant queries should only compute the minimum, 
> maximum, or number of distinct values for the partition key columns across 
> the whole table.
> {quote}
> This is false. Here's  query illustrating it working with all three things:
> {noformat}
> [localhost:21000] default> set optimize_partition_key_scans=true; explain 
> select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
> ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) 
> > 1000;
> OPTIMIZE_PARTITION_KEY_SCANS set to true
> Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales 
> where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having 
> max(ss_sold_date_sk) > 1000
> +------------------------------------------------------------+
> | Explain String                                             |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
> | Per-Host Resource Estimates: Memory=10MB                   |
> | Codegen disabled by planner                                |
> |                                                            |
> | PLAN-ROOT SINK                                             |
> | |                                                          |
> | 01:AGGREGATE [FINALIZE]                                    |
> | |  output: max(ss_sold_date_sk)                            |
> | |  group by: ss_sold_date_sk                               |
> | |  having: max(ss_sold_date_sk) > 1000                     |
> | |  row-size=8B cardinality=182                             |
> | |                                                          |
> | 00:UNION                                                   |
> |    constant-operands=182                                   |
> |    row-size=4B cardinality=182                             |
> +------------------------------------------------------------+
> Fetched 15 row(s) in 0.11s
> {noformat}
> We should reword this to be correct.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to