-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70832/
-----------------------------------------------------------
Review request for hive, Ashutosh Chauhan, Gopal V, and Prasanth_J.
Bugs: HIVE-21815
https://issues.apache.org/jira/browse/HIVE-21815
Repository: hive-git
Description
-------
Stats in ORC file are parsed twice
==================================
ORC record reader unnecessarily parses stats twice
```
if (orcTail == null) {
Reader orcReader = OrcFile.createReader(file.getPath(),
OrcFile.readerOptions(context.conf)
.filesystem(fs)
.maxLength(AcidUtils.getLogicalLength(fs, file)));
orcTail = new OrcTail(orcReader.getFileTail(),
orcReader.getSerializedFileFooter(),
file.getModificationTime());
if (context.cacheStripeDetails) {
context.footerCache.put(new FooterCacheKey(fsFileId, file.getPath()),
orcTail);
}
}
stripes = orcTail.getStripes();
stripeStats = orcTail.getStripeStatistics();
```
We go from Reader -> OrcTail -> StripeStatistics.
stripeStats is read out of the orcTail and is already read inside
orcReader.getStripeStatistics().
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 3878bba4d3
Diff: https://reviews.apache.org/r/70832/diff/1/
Testing
-------
run TestInputOutputFormat tests.
Thanks,
Krisztian Kasa