[jira] [Commented] (ORC-435) Ability to read stripes that are greater than 2GB

Rahul Pathak (JIRA) Tue, 20 Nov 2018 13:32:53 -0800


    [ 
https://issues.apache.org/jira/browse/ORC-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693804#comment-16693804
 ]


Rahul Pathak commented on ORC-435:
----------------------------------

Hi [~prasanth_j],

Any timeline on when we can get this fixed?

Rahul


> Ability to read stripes that are greater than 2GB
> -------------------------------------------------
>
>                 Key: ORC-435
>                 URL: https://issues.apache.org/jira/browse/ORC-435
>             Project: ORC
>          Issue Type: Bug
>          Components: Reader
>    Affects Versions: 1.3.4, 1.4.4, 1.6.0, 1.5.3
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Major
>             Fix For: 1.5.4, 1.6.0
>
>
> ORC reader fails with NegativeArraySizeException if the stripe size is >2GB. 
> Even though default stripe size is 64MB there are cases where stripe size 
> will reach >2GB even before memory manager can kick in to check memory size. 
> Say if we are inserting 500KB strings (mostly unique) by the time we reach 
> 5000 rows stripe size is already over 2GB. Reader will have to chunk the disk 
> range reads for such cases instead of reading the stripe as whole blob. 
> Exception thrown when reading such files
> {code:java}
> 2018-10-12 21:43:58,833 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NegativeArraySizeException
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:272)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:1007)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:835)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1029)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1062)
>         at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1085){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ORC-435) Ability to read stripes that are greater than 2GB

Reply via email to