[
https://issues.apache.org/jira/browse/PARQUET-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144173#comment-14144173
]
Tongjie Chen commented on PARQUET-100:
--------------------------------------
Hi [~julienledem], we actually have turned that flag on in production, it does
reduce the memory footprint but still read footers before applying that split
strategy.
This jira is to avoid reading footers completely.
> provide an option in parquet-pig to avoid reading footers in client side
> ------------------------------------------------------------------------
>
> Key: PARQUET-100
> URL: https://issues.apache.org/jira/browse/PARQUET-100
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: parquet-mr_1.6.0
> Reporter: Tongjie Chen
>
> Parquet Pig reads footer in client side, to calculate splits and retrieve
> schema etc.
> In HCatalog environment, if there are large number of files generated by
> Hive, Parquet-Pig will spend significant chunk of time processing those
> footers in client side (before job is submitted to cluster).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)