[
https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HIVE-21815:
---------------------------
Description:
ORC record reader unnecessarily parses stats twice
{code}
if (orcTail == null) {
Reader orcReader = OrcFile.createReader(file.getPath(),
OrcFile.readerOptions(context.conf)
.filesystem(fs)
.maxLength(AcidUtils.getLogicalLength(fs, file)));
orcTail = new OrcTail(orcReader.getFileTail(),
orcReader.getSerializedFileFooter(),
file.getModificationTime());
if (context.cacheStripeDetails) {
context.footerCache.put(new FooterCacheKey(fsFileId, file.getPath()),
orcTail);
}
}
stripes = orcTail.getStripes();
stripeStats = orcTail.getStripeStatistics();
{code}
We go from Reader -> OrcTail -> StripeStatistics.
stripeStats is read out of the orcTail and is already read inside
orcReader.getStripeStatistics().
!orc-tail-getproto.png!
[^tez-am-2x-protobuf.svg]
was:
ORC record reader unnecessarily parses stats twice
!orc-tail-getproto.png|thumbnail!
[^tez-am-2x-protobuf.svg]
> Stats in ORC file are parsed twice
> ----------------------------------
>
> Key: HIVE-21815
> URL: https://issues.apache.org/jira/browse/HIVE-21815
> Project: Hive
> Issue Type: Improvement
> Components: ORC
> Reporter: Gopal V
> Priority: Major
> Attachments: orc-tail-getproto.png, tez-am-2x-protobuf.svg
>
>
> ORC record reader unnecessarily parses stats twice
> {code}
> if (orcTail == null) {
> Reader orcReader = OrcFile.createReader(file.getPath(),
> OrcFile.readerOptions(context.conf)
> .filesystem(fs)
> .maxLength(AcidUtils.getLogicalLength(fs, file)));
> orcTail = new OrcTail(orcReader.getFileTail(),
> orcReader.getSerializedFileFooter(),
> file.getModificationTime());
> if (context.cacheStripeDetails) {
> context.footerCache.put(new FooterCacheKey(fsFileId,
> file.getPath()), orcTail);
> }
> }
> stripes = orcTail.getStripes();
> stripeStats = orcTail.getStripeStatistics();
> {code}
> We go from Reader -> OrcTail -> StripeStatistics.
> stripeStats is read out of the orcTail and is already read inside
> orcReader.getStripeStatistics().
> !orc-tail-getproto.png!
> [^tez-am-2x-protobuf.svg]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)