[ https://issues.apache.org/jira/browse/HIVE-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225540#comment-16225540 ]
Mithun Radhakrishnan edited comment on HIVE-10024 at 10/30/17 6:53 PM: ----------------------------------------------------------------------- (I have put this off long enough. Sorry for the delay in updating this JIRA. We ran into this a good while back in production.) The sparse summary and description belie the significance of this fix. It turns out that this wasn't a fix for test-failures at all. Before this fix, in cases where the last row-group for an ORC stripe contained fewer records than {{$\{orc.row.index.stride\}}}, and when predicate pushdown is enabled, one sees the following sort of failure: {noformat} java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 130 is outside of the data at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 130 is outside of the data at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 14 more {noformat} It's a fairly rare case, but leads to bad reads on valid ORC files. This fix is available in {{branch-2}} and forward, but not in {{branch-1}}. was (Author: mithun): I have put this off long enough. The sparse summary and description belie the significance of this fix. It turns out that this wasn't a fix for test-failures at all. Before this fix, in cases where the last row-group for an ORC stripe contained fewer records than {{$\{orc.row.index.stride\}}}, and when predicate pushdown is enabled, one sees the following sort of failure: {noformat} java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 130 is outside of the data at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for column 82 kind DATA to 130 is outside of the data at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:322) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 14 more {noformat} It's a fairly rare case, but leads to bad reads on valid ORC files. This fix is available in {{branch-2}} and forward, but not in {{branch-1}}. > LLAP: q file test is broken again > --------------------------------- > > Key: HIVE-10024 > URL: https://issues.apache.org/jira/browse/HIVE-10024 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Fix For: llap > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)