[
https://issues.apache.org/jira/browse/ORC-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved ORC-1393.
--------------------------------
Fix Version/s: 1.9.0
1.8.3
Resolution: Fixed
Issue resolved by pull request 1432
[https://github.com/apache/orc/pull/1432]
> Wrong length of uncompressed stream causes EOFException when reading
> --------------------------------------------------------------------
>
> Key: ORC-1393
> URL: https://issues.apache.org/jira/browse/ORC-1393
> Project: ORC
> Issue Type: Bug
> Reporter: Dmitriy Fingerman
> Assignee: Dmitriy Fingerman
> Priority: Major
> Fix For: 1.9.0, 1.8.3
>
>
> This issue is the root cause of the issue reported in HIVE-27128.
> Before 'ORC-516 - Update InStream for column compression',
> InStream.UncompressedStream class had 'length' field and the length was
> modifiable in reset() method.
> The reset() method was used in SettableUncompressedStream class in
> setBuffers() method:
>
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeInfo) {
> reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
> setOffset(diskRangeInfo.getDiskRanges());
> }{code}
> After Orc version upgrade in Hive to 1.6.7., and since
> SettableUncompressedStream class was removed from Orc code base, Hive manages
> it own copy of SettableUncompressedStream which doesn't pass new length to
> UncompressedStream when calling reset (because UncompressedStream doesn't
> accept new length any more in the reset method):
>
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeList) {
> reset(diskRangeList.getDiskRanges());
> setOffset(diskRangeList.getDiskRanges());
> } {code}
> When investigating the issue reported in HIVE-27128 and comparing the lengths
> of the InStream.UncompressedStream prior to the upgrade of ORC version in
> Hive to 1.6.7. (which included ORC-516) and after I noticed that the issue
> happens with ORC-516 changes because the length of the
> InStream.UncompressedStream is set once for all row groups, while without
> those changes the length is dynamic and sometimes is set to bigger value than
> the initial value.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)