[
https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975047#comment-15975047
]
David Capwell commented on HIVE-16480:
--------------------------------------
I didn't know if I should put this here or in ORC jira, but found this after
upgrading from 0.14.0 hive to 2.1.1 so felt I would start here.
> ORC file with empty array<double> and array<float> fails to read
> ----------------------------------------------------------------
>
> Key: HIVE-16480
> URL: https://issues.apache.org/jira/browse/HIVE-16480
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.1.1
> Reporter: David Capwell
>
> We have a schema that has a array<double> in it. We were unable to read this
> file and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work
> with type float
> java.io.IOException: Error reading file:
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-float.orc
> at
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
> ~[hive-exec-2.1.1.jar:2.1.1]
> at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.8.0_121]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[na:1.8.0_121]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.8.0_121]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> [junit-4.12.jar:4.12]
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> [junit-4.12.jar:4.12]
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> [junit-4.12.jar:4.12]
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> [junit-4.12.jar:4.12]
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> [junit-4.12.jar:4.12]
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> [junit-4.12.jar:4.12]
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
> at
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> [junit-rt.jar:na]
> at
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
> [junit-rt.jar:na]
> at
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
> [junit-rt.jar:na]
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> [junit-rt.jar:na]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.8.0_121]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[na:1.8.0_121]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.8.0_121]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
> at
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> ~[hive-orc-2.1.1.jar:2.1.1]
> ... 29 common frames omitted
> INFO 2017-04-19 09:29:17,091 [main] [WriterImpl] [line 205] ORC writer
> created for path:
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
> with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize:
> 262144
> INFO 2017-04-19 09:29:17,100 [main] [ReaderImpl] [line 357] Reading ORC rows
> from
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
> with {include: null, offset: 0, length: 9223372036854775807}
> INFO 2017-04-19 09:29:17,101 [main] [RecordReaderImpl] [line 142] Schema on
> read not provided -- using file schema array<double>
> ERROR 2017-04-19 09:29:17,104 [main] [EmptyList] [line 56] Failed to work
> with type double
> java.io.IOException: Error reading file:
> /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
> at
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
> ~[hive-exec-2.1.1.jar:2.1.1]
> at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.8.0_121]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[na:1.8.0_121]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.8.0_121]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> [junit-4.12.jar:4.12]
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> [junit-4.12.jar:4.12]
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> [junit-4.12.jar:4.12]
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> [junit-4.12.jar:4.12]
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> [junit-4.12.jar:4.12]
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> [junit-4.12.jar:4.12]
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> [junit-4.12.jar:4.12]
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
> at
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> [junit-rt.jar:na]
> at
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
> [junit-rt.jar:na]
> at
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
> [junit-rt.jar:na]
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> [junit-rt.jar:na]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.8.0_121]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[na:1.8.0_121]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.8.0_121]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream
> for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
> at
> org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:101)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:97)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:713)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
> ~[hive-orc-2.1.1.jar:2.1.1]
> at
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> ~[hive-orc-2.1.1.jar:2.1.1]
> ... 29 common frames omitted
> {code}
> If you create a ORC file with one row as the following
> {code}
> orc.addRow(Lists.newArrayList());
> {code}
> then try to read it
> {code}
> VectorizedRowBatch batch = reader.getSchema().createRowBatch();
> while (rows.nextBatch(batch)) { }
> {code}
> You will produce the above stack trace.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)