[ 
https://issues.apache.org/jira/browse/HIVE-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862639#comment-16862639
 ] 

Gopal V commented on HIVE-21815:
--------------------------------

LGTM - +1

> Stats in ORC file are parsed twice
> ----------------------------------
>
>                 Key: HIVE-21815
>                 URL: https://issues.apache.org/jira/browse/HIVE-21815
>             Project: Hive
>          Issue Type: Improvement
>          Components: ORC
>            Reporter: Gopal V
>            Assignee: Krisztian Kasa
>            Priority: Major
>         Attachments: HIVE-21815.1.patch, HIVE-21815.1.patch, 
> HIVE-21815.2.patch, orc-tail-getproto.png, tez-am-2x-protobuf.svg
>
>
> ORC record reader unnecessarily parses stats twice
> {code}
>       if (orcTail == null) {
>         Reader orcReader = OrcFile.createReader(file.getPath(),
>             OrcFile.readerOptions(context.conf)
>                 .filesystem(fs)
>                 .maxLength(AcidUtils.getLogicalLength(fs, file)));
>         orcTail = new OrcTail(orcReader.getFileTail(), 
> orcReader.getSerializedFileFooter(),
>             file.getModificationTime());
>         if (context.cacheStripeDetails) {
>           context.footerCache.put(new FooterCacheKey(fsFileId, 
> file.getPath()), orcTail);
>         }
>       }
>       stripes = orcTail.getStripes();
>       stripeStats = orcTail.getStripeStatistics();
> {code}
> We go from Reader -> OrcTail -> StripeStatistics.
> stripeStats is read out of the orcTail and is already read inside 
> orcReader.getStripeStatistics().
> !orc-tail-getproto.png!
>  [^tez-am-2x-protobuf.svg] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to