[jira] [Updated] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads

ASF GitHub Bot (Jira) Thu, 24 Oct 2019 07:43:56 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDDS-2359:
---------------------------------
    Labels: pull-request-available  (was: )

> Seeking randomly in a key with more than 2 blocks of data leads to 
> inconsistent reads
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-2359
>                 URL: https://issues.apache.org/jira/browse/HDDS-2359
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Istvan Fajth
>            Assignee: Shashikant Banerjee
>            Priority: Critical
>              Labels: pull-request-available
>
> During Hive testing we found the following exception:
> {code}
> TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1569246922012_0214_1_03_000000_3:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: error iterating
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: error iterating
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>     ... 16 more
> Caused by: java.io.IOException: java.io.IOException: error iterating
>     at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>     at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>     at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366)
>     at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
>     at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
>     at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>     at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
>     at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>     ... 18 more
> Caused by: java.io.IOException: error iterating
>     at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:835)
>     at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:74)
>     at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361)
>     ... 24 more
> Caused by: java.io.IOException: Error reading file: 
> o3fs://hive.warehouse.vc0136.halxg.cloudera.com:9862/data/inventory/delta_0000001_0000001_0000/bucket_00000
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1283)
>     at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:156)
>     at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$1.next(VectorizedOrcAcidRowBatchReader.java:150)
>     at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$1.next(VectorizedOrcAcidRowBatchReader.java:146)
>     at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:831)
>     ... 26 more
> Caused by: java.io.IOException: Inconsistent read for blockID=conID: 2 locID: 
> 102851451236759576 bcsId: 14608 length=26398272 numBytesRead=6084153
>     at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:176)
>     at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
>     at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>     at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
>     at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
>     at 
> org.apache.orc.impl.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:557)
>     at 
> org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readFileData(RecordReaderUtils.java:276)
>     at 
> org.apache.orc.impl.RecordReaderImpl.readPartialDataStreams(RecordReaderImpl.java:1189)
>     at 
> org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1057)
>     at 
> org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1208)
>     at 
> org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1243)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1279)
>     ... 30 more
> {code}
> Evaluating the code path, the following is the issue:
> given a file with more data than 2 blocks
> when there are random seeks in the file to the end then to the beginning
> then the read fails with the final cause of the exception above.
> [~shashikant] has a solution already for this issue, which we have 
> successfully tested internally with Hive, I am assigning this JIRA to him to 
> post the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads

Reply via email to