[
https://issues.apache.org/jira/browse/HIVE-24566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254633#comment-17254633
]
Jesus Camacho Rodriguez commented on HIVE-24566:
------------------------------------------------
[~belugabehr], yes, I think this approach could potentially improve performance
for such queries. I guess you referred to 'single multi-threaded processor' to
avoid launching any jobs to compute these queries. For tables with a large
number of files, computing from metadata even if jobs are launched, would still
be a useful optimization.
> Add Parquet Stats Optimization
> -------------------------------
>
> Key: HIVE-24566
> URL: https://issues.apache.org/jira/browse/HIVE-24566
> Project: Hive
> Issue Type: Improvement
> Reporter: David Mollitor
> Priority: Major
>
> Parquet files store min/max/count data in foot metadata.
> When a query is submitted to a Parquet table, and stats are not available,
> Hive should launch a single multi-threaded processor that simply reads the
> meta data of each Parquet file instead of walking through every single record
> in the table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)