[
https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krisztian Kasa updated HIVE-21815:
----------------------------------
Status: Open (was: Patch Available)
> Stats in ORC file are parsed twice
> ----------------------------------
>
> Key: HIVE-21815
> URL: https://issues.apache.org/jira/browse/HIVE-21815
> Project: Hive
> Issue Type: Improvement
> Components: ORC
> Reporter: Gopal V
> Assignee: Krisztian Kasa
> Priority: Major
> Attachments: HIVE-21815.1.patch, orc-tail-getproto.png,
> tez-am-2x-protobuf.svg
>
>
> ORC record reader unnecessarily parses stats twice
> {code}
> if (orcTail == null) {
> Reader orcReader = OrcFile.createReader(file.getPath(),
> OrcFile.readerOptions(context.conf)
> .filesystem(fs)
> .maxLength(AcidUtils.getLogicalLength(fs, file)));
> orcTail = new OrcTail(orcReader.getFileTail(),
> orcReader.getSerializedFileFooter(),
> file.getModificationTime());
> if (context.cacheStripeDetails) {
> context.footerCache.put(new FooterCacheKey(fsFileId,
> file.getPath()), orcTail);
> }
> }
> stripes = orcTail.getStripes();
> stripeStats = orcTail.getStripeStatistics();
> {code}
> We go from Reader -> OrcTail -> StripeStatistics.
> stripeStats is read out of the orcTail and is already read inside
> orcReader.getStripeStatistics().
> !orc-tail-getproto.png!
> [^tez-am-2x-protobuf.svg]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)