[jira] [Comment Edited] (ORC-1787) Failure on readDiskRangesVectored with concurrent record readers

Dongjoon Hyun (Jira) Fri, 04 Oct 2024 08:23:05 -0700


    [ 
https://issues.apache.org/jira/browse/ORC-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886948#comment-17886948
 ]


Dongjoon Hyun edited comment on ORC-1787 at 10/4/24 3:22 PM:
-------------------------------------------------------------

Thank you for reporting, [~oscartorreno]. HADOOP-18103 change in 
`org.apache.hadoop.fs.FileRange` seems to cause your issue. Hadoop itself 
introduced parallel read in these days.

The reported code path is `FileRange.getData().get()` part in the following.
- 
https://github.com/apache/orc/blob/9bd9e5a18f3e4a2c1298c0b52b0f62993bf476be/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L595

{code}
    fileInputStream.readVectored(fileRanges, allocate);

    for (FileRange r : fileRanges) {
      cur = map.get(r);
      try {
        cur.setChunk(r.getData().get()); // Here
      } catch (InterruptedException | ExecutionException e) {
        throw new RuntimeException(e);
      }
    }
{code}




was (Author: dongjoon):
Thank you for reporting, [~oscartorreno]. HADOOP-18103 change in 
`org.apache.hadoop.fs.FileRange` seems to cause your issue. Hadoop itself 
introduced parallel read in these days.

The reported code path is `FileRange.getData().get()` part in the following.
- 
https://github.com/apache/orc/blob/9bd9e5a18f3e4a2c1298c0b52b0f62993bf476be/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L595

{code}
    for (FileRange r : fileRanges) {
      cur = map.get(r);
      try {
        cur.setChunk(r.getData().get()); // Here
      } catch (InterruptedException | ExecutionException e) {
        throw new RuntimeException(e);
      }
    }
{code}



> Failure on readDiskRangesVectored with concurrent record readers
> ----------------------------------------------------------------
>
>                 Key: ORC-1787
>                 URL: https://issues.apache.org/jira/browse/ORC-1787
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 2.0.2
>            Reporter: Oscar Torreno
>            Priority: Major
>
> We have an application that does RecordReader building & row reading 
> concurrently. Our code worked fine on ORC v1.9.1 but it started sporadically 
> failing with ORC 2.0.2. We failed to find any claims about thread-safety 
> guaranties for record reader creation. Are we supposed to construct 
> RecordReaders in parallel for a single file and it's an implementation bug? 
> or it's not supported and RecordReader instances creation should be 
> synchronized? when we synchronize the creation the problem goes away
>  
> The stacktrace:
> {code:java}
> java.nio.channels.AsynchronousCloseException
>     at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl$2.run(SimpleAsynchronousFileChannelImpl.java:326)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>     at java.lang.Thread.run(Thread.java:840)
> [wrapped] java.util.concurrent.ExecutionException: 
> java.nio.channels.AsynchronousCloseException
>     at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
>     at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
>     at 
> org.apache.orc.impl.RecordReaderUtils.readDiskRangesVectored(RecordReaderUtils.java:595)
> [wrapped] java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: 
> java.nio.channels.AsynchronousCloseException
>     at 
> org.apache.orc.impl.RecordReaderUtils.readDiskRangesVectored(RecordReaderUtils.java:597)
>     at 
> org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readFileData(RecordReaderUtils.java:114)
>     at 
> org.apache.orc.impl.reader.StripePlanner.readData(StripePlanner.java:178)
>     at 
> org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1314)
>     at 
> org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1354)
>     at 
> org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1397)
>     at 
> org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:367){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (ORC-1787) Failure on readDiskRangesVectored with concurrent record readers

Reply via email to