[jira] [Commented] (HIVE-24566) Add Parquet Stats Optimization

Jesus Camacho Rodriguez (Jira) Thu, 24 Dec 2020 10:44:07 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-24566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254633#comment-17254633
 ]


Jesus Camacho Rodriguez commented on HIVE-24566:
------------------------------------------------

[~belugabehr], yes, I think this approach could potentially improve performance 
for such queries. I guess you referred to 'single multi-threaded processor' to 
avoid launching any jobs to compute these queries. For tables with a large 
number of files, computing from metadata even if jobs are launched, would still 
be a useful optimization.

> Add  Parquet Stats Optimization
> -------------------------------
>
>                 Key: HIVE-24566
>                 URL: https://issues.apache.org/jira/browse/HIVE-24566
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Priority: Major
>
> Parquet files store min/max/count data in foot metadata.
> When a query is submitted to a Parquet table, and stats are not available, 
> Hive should launch a single multi-threaded processor that simply reads the 
> meta data of each Parquet file instead of walking through every single record 
> in the table. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24566) Add Parquet Stats Optimization

Reply via email to