[ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473739#comment-16473739
 ] 

Vineet Garg commented on HIVE-19479:
------------------------------------

[~sershe] Can you upload patch for branch-3?

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> ------------------------------------------------------------
>
>                 Key: HIVE-19479
>                 URL: https://issues.apache.org/jira/browse/HIVE-19479
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>             Fix For: 3.1.0
>
>         Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>       at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>       at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>       at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>       at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>       at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>       at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>       at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>       at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}
> We found this happens when ORC writes a strange stream combination - data 
> stream for a RG has no values (the rows all have nulls), but there are values 
> (0-s) in length stream for the same rows. That is technically a valid ORC 
> file, although writing the 0s is completely useless. 
> This may be fixed separately in ORC, but since these files now exist in the 
> wild we should handle them correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to