[jira] [Commented] (HADOOP-19303) VectorIO API to support releasing buffers on failure

ASF GitHub Bot (Jira) Mon, 24 Feb 2025 16:36:04 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929944#comment-17929944
 ]


ASF GitHub Bot commented on HADOOP-19303:
-----------------------------------------

cnauroth commented on code in PR #7418:
URL: https://github.com/apache/hadoop/pull/7418#discussion_r1968617441


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java:
##########
@@ -136,4 +137,30 @@ default void readVectored(List<? extends FileRange> ranges,
                             IntFunction<ByteBuffer> allocate) throws 
IOException {
     VectoredReadUtils.readVectored(this, ranges, allocate);
   }
+
+  /**
+   * Variant of {@link #readVectored(List, IntFunction)} where a release() 
function
+   * may be invoked if problems surface during reads -this method is called to
+   * try to return any allocated buffer which has not been read yet.
+   * Buffers which have successfully been read and returned to the caller do 
not
+   * get released: this is for failures only.
+   * <p>
+   * The default implementation calls readVectored/2 so as to ensure that
+   * if an existing stream implementation does not implement this method
+   * all is good.
+   * <p>
+   * Implementations SHOULD override this method if they can release buffers as
+   * part of their error handling.
+   * @param ranges the byte ranges to read
+   * @param allocate the function to allocate ByteBuffer
+   * @param release the function to release a ByteBuffer.
+   * @throws IOException any IOE.
+   * @throws IllegalArgumentException if the any of ranges are invalid, or 
they overlap.

Review Comment:
   nit: seems like some words are out of order. Maybe "if any of the ranges..."?



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java:
##########
@@ -319,74 +321,125 @@ AsynchronousFileChannel getAsyncChannel() throws 
IOException {
     @Override
     public void readVectored(List<? extends FileRange> ranges,
                              IntFunction<ByteBuffer> allocate) throws 
IOException {
+      readVectored(ranges, allocate, LOG_BYTE_BUFFER_RELEASED);
+    }
+
+    @Override
+    public void readVectored(final List<? extends FileRange> ranges,
+        final IntFunction<ByteBuffer> allocate,
+        final Consumer<ByteBuffer> release) throws IOException {
 
       // Validate, but do not pass in a file length as it may change.
       List<? extends FileRange> sortedRanges = sortRangeList(ranges);
-      // Set up all of the futures, so that we can use them if things fail
-      for(FileRange range: sortedRanges) {
+      // Set up all of the futures, so that the caller can await on
+      // their competion.

Review Comment:
   nit: "completion"



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSDataInputStream.java:
##########
@@ -306,4 +307,11 @@ public void readVectored(List<? extends FileRange> ranges,
                            IntFunction<ByteBuffer> allocate) throws 
IOException {
     ((PositionedReadable) in).readVectored(ranges, allocate);
   }
+
+  @Override
+  public void readVectored(final List<? extends FileRange> ranges,

Review Comment:
   Do we need coverage of this in the spec?
   
   
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html#void_readVectored.28List.3C.3F_extends_FileRange.3E_ranges.2C_IntFunction.3CByteBuffer.3E_allocate.29





> VectorIO API to support releasing buffers on failure
> ----------------------------------------------------
>
>                 Key: HADOOP-19303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> extend for vector IO API with a method that takes a ByteBufferPool 
> implementation rather than just an allocator. This allows for buffers to be 
> returned to the pool when problems occur, before throwing an exception.
> The Parquet API is already designed for this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19303) VectorIO API to support releasing buffers on failure

Reply via email to