[ 
https://issues.apache.org/jira/browse/ORC-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman updated ORC-1393:
-----------------------------------
    Description: 
This issue is the root cause of the issue reported in HIVE-27128.

Before 'ORC-516 - Update InStream for column compression', 
InStream.UncompressedStream class had 'length' field and the length was 
modifiable in reset() method. 

The reset() method was used in SettableUncompressedStream class in setBuffers() 
method:

 
{code:java}
public void setBuffers(DiskRangeInfo diskRangeInfo) {
  reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
  setOffset(diskRangeInfo.getDiskRanges());
}{code}
After Orc version upgrade in Hive to 1.6.7., and since 
SettableUncompressedStream class was removed from Orc code base, Hive manages 
it own copy of SettableUncompressedStream which doesn't pass new length to 
UncompressedStream when calling reset (because UncompressedStream doesn't 
accept new length any more in the reset method):

 
{code:java}
public void setBuffers(DiskRangeInfo diskRangeList) {
  reset(diskRangeList.getDiskRanges());
  setOffset(diskRangeList.getDiskRanges());
} {code}
When investigating the issue reported in HIVE-27128 and comparing the lengths 
of the InStream.UncompressedStream prior to the upgrade of ORC version in Hive 
to 1.6.7. (which included ORC-516) and after I noticed that the issue happens 
with ORC-516 changes because the length of the InStream.UncompressedStream is 
set once for all row groups, while without those changes the length is dynamic 
and sometimes is set to bigger value than the initial value.

 

  was:
This issue is the root cause of the issue reported in HIVE-27128.

Before 'ORC-516 - Update InStream for column compression', 
InStream.UncompressedStream class had 'length' field and the length was 
modifiable in reset() method. 

The reset() method was used in SettableUncompressedStream class in setBuffers() 
method:

 
{code:java}
public void setBuffers(DiskRangeInfo diskRangeInfo) {
  reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
  setOffset(diskRangeInfo.getDiskRanges());
}{code}
After Orc version upgrade in Hive to 1.6.7., and since 
SettableUncompressedStream class was removed from Hive, Hive manages it own 
version of SettableUncompressedStream which doesn't pass new length to 
UncompressedStream when calling reset (because UncompressedStream doesn't 
accept new length any more in the reset method):

 
{code:java}
public void setBuffers(DiskRangeInfo diskRangeList) {
  reset(diskRangeList.getDiskRanges());
  setOffset(diskRangeList.getDiskRanges());
} {code}
When investigating the issue reported in HIVE-27128 and comparing the lengths 
of the InStream.UncompressedStream prior to the upgrade of ORC version in Hive 
to 1.6.7. (which included ORC-516) and after I noticed that the issue happens 
with ORC-516 changes because the length of the InStream.UncompressedStream is 
set once for all row groups, while without those changes the length is dynamic 
and sometimes is set to bigger value than the initial value.

 


> Wrong length of uncompressed stream causes EOFException when reading
> --------------------------------------------------------------------
>
>                 Key: ORC-1393
>                 URL: https://issues.apache.org/jira/browse/ORC-1393
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Dmitriy Fingerman
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>
> This issue is the root cause of the issue reported in HIVE-27128.
> Before 'ORC-516 - Update InStream for column compression', 
> InStream.UncompressedStream class had 'length' field and the length was 
> modifiable in reset() method. 
> The reset() method was used in SettableUncompressedStream class in 
> setBuffers() method:
>  
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeInfo) {
>   reset(diskRangeInfo.getDiskRanges(), diskRangeInfo.getTotalLength());
>   setOffset(diskRangeInfo.getDiskRanges());
> }{code}
> After Orc version upgrade in Hive to 1.6.7., and since 
> SettableUncompressedStream class was removed from Orc code base, Hive manages 
> it own copy of SettableUncompressedStream which doesn't pass new length to 
> UncompressedStream when calling reset (because UncompressedStream doesn't 
> accept new length any more in the reset method):
>  
> {code:java}
> public void setBuffers(DiskRangeInfo diskRangeList) {
>   reset(diskRangeList.getDiskRanges());
>   setOffset(diskRangeList.getDiskRanges());
> } {code}
> When investigating the issue reported in HIVE-27128 and comparing the lengths 
> of the InStream.UncompressedStream prior to the upgrade of ORC version in 
> Hive to 1.6.7. (which included ORC-516) and after I noticed that the issue 
> happens with ORC-516 changes because the length of the 
> InStream.UncompressedStream is set once for all row groups, while without 
> those changes the length is dynamic and sometimes is set to bigger value than 
> the initial value.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to