[ 
https://issues.apache.org/jira/browse/ORC-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shardul Mahadik updated ORC-555:
--------------------------------
    Affects Version/s: 1.6.0

> IllegalArgumentException when reading files written with older ORC writers in 
> ORC 1.6
> -------------------------------------------------------------------------------------
>
>                 Key: ORC-555
>                 URL: https://issues.apache.org/jira/browse/ORC-555
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Shardul Mahadik
>            Priority: Major
>
> I am using {{orc-core::nohive}} to read an ORC file which was generated using 
> an older version of ORC (probably through Hive 1.1). I am unable to read this 
> file since ORC 1.6 and am able to read it in 1.5.5.
> Code:
> {code:java}
> final Reader orcReader = OrcFile.createReader(new 
> Path("/Users/smahadik/orcFailure.orc"),
>     OrcFile.readerOptions(new Configuration()));
> System.out.println(orcReader.getNumberOfRows());
> {code}
> Stacktrace:
> {code:java}
> java.io.IOException: Problem reading file footer 
> /Users/smahadik/orcFailure.orc
>       at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:716)
>       at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:500)
>       at org.apache.orc.OrcFile.createReader(OrcFile.java:365)
>       at example.testFileFooterReadFailure(TestOrcMetrics.java:16)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>       at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>       at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>       at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>       at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>       at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: java.lang.IllegalArgumentException
>       at java.nio.Buffer.position(Buffer.java:244)
>       at 
> org.apache.orc.impl.InStream$CompressedStream.setCurrent(InStream.java:453)
>       at 
> org.apache.orc.impl.InStream$CompressedStream.reset(InStream.java:440)
>       at 
> org.apache.orc.impl.InStream$CompressedStream.<init>(InStream.java:426)
>       at org.apache.orc.impl.InStream.create(InStream.java:843)
>       at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:706)
>       ... 25 more
> {code}
> Unfortunately I cannot share the data file for the failure. I am not really 
> familiar with the ORC codebase so not sure what is actually happening here. I 
> will try to dig more though if I can find any more information.
> Here's what I know so far. The error occurs at 
> https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/InStream.java#L453
>  because the {{compressed}} limit is less than the position it is trying to 
> set. It is going through this if condition in {{ReaderImpl}} which was 
> changed recently  
> https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L691
> The extra value is around 3k so the code seems to switch the original buffer 
> of limit 16k to new buffer of limit 3k. This smaller buffer is passed to 
> https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L706
>  and it fails eventually.
> Values of some variables at line 706
> size = 309950950
> readSize = 16384
> psLen = 26
> psOffset = 309950923
> tailSize = 20314
> footerSize = 3650
> metadataSize = 16637
> extra = 3930
> buffer = data range [309930636, 309934566), size: 3930 type: array-backed
> buffer.next = data range [309934566, 309950950), size: 16384 type: 
> array-backed
> stripeStatSize = 0
> Does anyone have any insights/intuition about what might be happening and how 
> we can debug this? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to