[
https://issues.apache.org/jira/browse/ORC-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941280#comment-16941280
]
Owen O'Malley commented on ORC-555:
-----------------------------------
Ok, I see what is going on. I'll get a fix today.
> IllegalArgumentException when reading files written with older ORC writers in
> ORC 1.6
> -------------------------------------------------------------------------------------
>
> Key: ORC-555
> URL: https://issues.apache.org/jira/browse/ORC-555
> Project: ORC
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Shardul Mahadik
> Assignee: Owen O'Malley
> Priority: Major
>
> I am using {{orc-core::nohive}} to read an ORC file which was generated using
> an older version of ORC (probably through Hive 1.1). I am unable to read this
> file since ORC 1.6 and am able to read it in 1.5.5.
> Code:
> {code:java}
> final Reader orcReader = OrcFile.createReader(new
> Path("/Users/smahadik/orcFailure.orc"),
> OrcFile.readerOptions(new Configuration()));
> System.out.println(orcReader.getNumberOfRows());
> {code}
> Stacktrace:
> {code:java}
> java.io.IOException: Problem reading file footer
> /Users/smahadik/orcFailure.orc
> at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:716)
> at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:500)
> at org.apache.orc.OrcFile.createReader(OrcFile.java:365)
> at example.testFileFooterReadFailure(TestOrcMetrics.java:16)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: java.lang.IllegalArgumentException
> at java.nio.Buffer.position(Buffer.java:244)
> at
> org.apache.orc.impl.InStream$CompressedStream.setCurrent(InStream.java:453)
> at
> org.apache.orc.impl.InStream$CompressedStream.reset(InStream.java:440)
> at
> org.apache.orc.impl.InStream$CompressedStream.<init>(InStream.java:426)
> at org.apache.orc.impl.InStream.create(InStream.java:843)
> at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:706)
> ... 25 more
> {code}
> Unfortunately I cannot share the data file for the failure. I am not really
> familiar with the ORC codebase so not sure what is actually happening here. I
> will try to dig more though if I can find any more information.
> Here's what I know so far. The error occurs at
> https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/InStream.java#L453
> because the {{compressed}} limit is less than the position it is trying to
> set. It is going through this if condition in {{ReaderImpl}} which was
> changed recently
> https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L691
> The extra value is around 3k so the code seems to switch the original buffer
> of limit 16k to new buffer of limit 3k. This smaller buffer is passed to
> https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L706
> and it fails eventually.
> Values of some variables at line 706
> size = 309950950
> readSize = 16384
> psLen = 26
> psOffset = 309950923
> tailSize = 20314
> footerSize = 3650
> metadataSize = 16637
> extra = 3930
> buffer = data range [309930636, 309934566), size: 3930 type: array-backed
> buffer.next = data range [309934566, 309950950), size: 16384 type:
> array-backed
> stripeStatSize = 0
> Does anyone have any insights/intuition about what might be happening and how
> we can debug this?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)