[ 
https://issues.apache.org/jira/browse/IMPALA-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3475.
-----------------------------------
    Resolution: Later

We added various optimisations since this JIRA was filed and count(*) is now 
fast in practice on Parquet - between the optimisation to read the row count 
from the footer and data cache. Doesn't seem worth the risk of incorrect 
results.

> Extend partition key scans to support count(*)
> ----------------------------------------------
>
>                 Key: IMPALA-3475
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3475
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.5.0
>            Reporter: Mostafa Mokhtar
>            Priority: Minor
>
> Queries like the one below should be solved entirely from metadata where 
> store_sales is partitioned on ss_sold_date_sk
> {code}
> select ss_sold_date_sk , count(*) from store_sales group by ss_sold_date_sk;
> {code}
> {code}
> +----------------------------------------------------------+
> | Explain String                                           |
> +----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=20.00MB VCores=2 |
> |                                                          |
> | 04:EXCHANGE [UNPARTITIONED]                              |
> | |                                                        |
> | 03:AGGREGATE [FINALIZE]                                  |
> | |  output: count:merge(*)                                |
> | |  group by: ss_sold_date_sk                             |
> | |                                                        |
> | 02:EXCHANGE [HASH(ss_sold_date_sk)]                      |
> | |                                                        |
> | 01:AGGREGATE [STREAMING]                                 |
> | |  output: count(*)                                      |
> | |  group by: ss_sold_date_sk                             |
> | |                                                        |
> | 00:SCAN HDFS [tpcds_1000_parquet.store_sales]            |
> |    partitions=1824/1824 files=1824 size=189.24GB         |
> +----------------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to