[
https://issues.apache.org/jira/browse/HIVE-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121962#comment-15121962
]
Prasanth Jayachandran edited comment on HIVE-12847 at 1/28/16 5:45 PM:
-----------------------------------------------------------------------
[~owen.omalley] Was discussing about adding a reader interface to return
serialized representation of footer,metadata etc. If we have that then getting
the serialized representation from it will be trivial. Owen, is there a
patch/jira already for that?
was (Author: prasanth_j):
[~owen.omalley] Was discussing about adding a reader interface to return
serialized representation of footer,metadata etc. If we have that then getting
the serialized representation from it will be trivial. Owen, is there a patch
already for that?
> ORC file footer cache should be memory sensitive
> ------------------------------------------------
>
> Key: HIVE-12847
> URL: https://issues.apache.org/jira/browse/HIVE-12847
> Project: Hive
> Issue Type: Improvement
> Components: File Formats, ORC
> Affects Versions: 1.2.1
> Reporter: Nemon Lou
> Assignee: Nemon Lou
> Attachments: HIVE-12847.patch
>
>
> The size based footer cache can not control memory usage properly.
> Having seen a HiveServer2 hang (full GC all the time) due to ORC file footer
> cache taking up too much heap memory.
> A simple query like "select * from orc_table limit 1" can make HiveServer2
> hang.
> The input table has about 1000 ORC files and each ORC file owns about 2500
> stripes.
> {noformat}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 214653601 25758432120
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics
> 3: 122233301 8800797672
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics
> 5: 89439001 6439608072
> org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics
> 7: 2981300 262354400
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation
> 9: 2981300 143102400
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics
> 12: 2983691 71608584
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl
> 15: 80929 7121752
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type
> 17: 103282 5783792
> org.apache.hadoop.mapreduce.lib.input.FileSplit
> 20: 51641 3305024
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit
> 21: 51641 3305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit
> 31: 1 413152
> [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit;
> 100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata
>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)