Syed Shameerur Rahman created ORC-626:
-----------------------------------------

             Summary: Reading STRUCT Column Having Multiple Fields With Same 
Name Causes java.io.EOFException
                 Key: ORC-626
                 URL: https://issues.apache.org/jira/browse/ORC-626
             Project: ORC
          Issue Type: Bug
            Reporter: Syed Shameerur Rahman


*Steps To Repro In Hive:*


{code:java}
set hive.fetch.task.conversion=none;
set orc.force.positional.evolution=true;

create table complex_orc(device struct<a:string,a:string,b:string>) stored as 
orc;
insert into complex_orc select named_struct("a","123","a","823","b","23");
{code}

*Fails with the following exception:*


{code:java}
Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
stream Stream for column 3 kind LENGTH position: 6 length: 6 range: 0 offset: 
16 limit: 16 range 0 = 0 to 6 uncompressed: 3 to 3
        at 
org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)
        at 
org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
        at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
        at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1299)
        at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1336)
        at 
org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1434)
        at 
org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1280)
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1836)
        at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1818)
        at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1149)
{code}

This is caused due to ORC-54 where schema evolution was done based on filed 
names rather than index. Setting *orc.force.positional.evolution* will force to 
do a positional schema evolution but the positional level is hardcoded to 1 
(for non acid). Even though it doesn't make sense to have multiple fields with 
same name in in struct, It breaks the backward compatibly with hive 1.2 / 
hive2.1.

[~omalley] Can you please share the idea behind setting *positional level * to 
1. Is it really required when orc.force.positional.evolution is set?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to