[
https://issues.apache.org/jira/browse/ORC-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886948#comment-17886948
]
Dongjoon Hyun edited comment on ORC-1787 at 10/4/24 3:22 PM:
-------------------------------------------------------------
Thank you for reporting, [~oscartorreno]. HADOOP-18103 change in
`org.apache.hadoop.fs.FileRange` seems to cause your issue. Hadoop itself
introduced parallel read in these days.
The reported code path is `FileRange.getData().get()` part in the following.
-
https://github.com/apache/orc/blob/9bd9e5a18f3e4a2c1298c0b52b0f62993bf476be/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L595
{code}
fileInputStream.readVectored(fileRanges, allocate);
for (FileRange r : fileRanges) {
cur = map.get(r);
try {
cur.setChunk(r.getData().get()); // Here
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
{code}
was (Author: dongjoon):
Thank you for reporting, [~oscartorreno]. HADOOP-18103 change in
`org.apache.hadoop.fs.FileRange` seems to cause your issue. Hadoop itself
introduced parallel read in these days.
The reported code path is `FileRange.getData().get()` part in the following.
-
https://github.com/apache/orc/blob/9bd9e5a18f3e4a2c1298c0b52b0f62993bf476be/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L595
{code}
for (FileRange r : fileRanges) {
cur = map.get(r);
try {
cur.setChunk(r.getData().get()); // Here
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
{code}
> Failure on readDiskRangesVectored with concurrent record readers
> ----------------------------------------------------------------
>
> Key: ORC-1787
> URL: https://issues.apache.org/jira/browse/ORC-1787
> Project: ORC
> Issue Type: Bug
> Affects Versions: 2.0.2
> Reporter: Oscar Torreno
> Priority: Major
>
> We have an application that does RecordReader building & row reading
> concurrently. Our code worked fine on ORC v1.9.1 but it started sporadically
> failing with ORC 2.0.2. We failed to find any claims about thread-safety
> guaranties for record reader creation. Are we supposed to construct
> RecordReaders in parallel for a single file and it's an implementation bug?
> or it's not supported and RecordReader instances creation should be
> synchronized? when we synchronize the creation the problem goes away
>
> The stacktrace:
> {code:java}
> java.nio.channels.AsynchronousCloseException
> at
> sun.nio.ch.SimpleAsynchronousFileChannelImpl$2.run(SimpleAsynchronousFileChannelImpl.java:326)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.lang.Thread.run(Thread.java:840)
> [wrapped] java.util.concurrent.ExecutionException:
> java.nio.channels.AsynchronousCloseException
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
> at
> org.apache.orc.impl.RecordReaderUtils.readDiskRangesVectored(RecordReaderUtils.java:595)
> [wrapped] java.lang.RuntimeException:
> java.util.concurrent.ExecutionException:
> java.nio.channels.AsynchronousCloseException
> at
> org.apache.orc.impl.RecordReaderUtils.readDiskRangesVectored(RecordReaderUtils.java:597)
> at
> org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readFileData(RecordReaderUtils.java:114)
> at
> org.apache.orc.impl.reader.StripePlanner.readData(StripePlanner.java:178)
> at
> org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1314)
> at
> org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1354)
> at
> org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1397)
> at
> org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:367){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)