[jira] [Commented] (HADOOP-19303) VectorIO API to support releasing buffers on failure

ASF GitHub Bot (Jira) Fri, 21 Feb 2025 11:40:21 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929239#comment-17929239
 ]


ASF GitHub Bot commented on HADOOP-19303:
-----------------------------------------

mukund-thakur commented on code in PR #7418:
URL: https://github.com/apache/hadoop/pull/7418#discussion_r1966018354


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java:
##########
@@ -319,74 +321,125 @@ AsynchronousFileChannel getAsyncChannel() throws 
IOException {
     @Override
     public void readVectored(List<? extends FileRange> ranges,
                              IntFunction<ByteBuffer> allocate) throws 
IOException {
+      readVectored(ranges, allocate, LOG_BYTE_BUFFER_RELEASED);
+    }
+
+    @Override
+    public void readVectored(final List<? extends FileRange> ranges,
+        final IntFunction<ByteBuffer> allocate,
+        final Consumer<ByteBuffer> release) throws IOException {
 
       // Validate, but do not pass in a file length as it may change.
       List<? extends FileRange> sortedRanges = sortRangeList(ranges);
-      // Set up all of the futures, so that we can use them if things fail
-      for(FileRange range: sortedRanges) {
+      // Set up all of the futures, so that the caller can await on
+      // their competion.
+      for (FileRange range: sortedRanges) {
         validateRangeRequest(range);
         range.setData(new CompletableFuture<>());
       }
-      try {
-        AsynchronousFileChannel channel = getAsyncChannel();
-        ByteBuffer[] buffers = new ByteBuffer[sortedRanges.size()];
-        AsyncHandler asyncHandler = new AsyncHandler(channel, sortedRanges, 
buffers);
-        for(int i = 0; i < sortedRanges.size(); ++i) {
-          FileRange range = sortedRanges.get(i);
-          buffers[i] = allocate.apply(range.getLength());
-          channel.read(buffers[i], range.getOffset(), i, asyncHandler);
-        }
-      } catch (IOException ioe) {
-        LOG.debug("Exception occurred during vectored read ", ioe);

Review Comment:
   seems like this was unnecessary. Any failure will automatically be caught in 
failed() method. 



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java:
##########
@@ -136,4 +137,30 @@ default void readVectored(List<? extends FileRange> ranges,
                             IntFunction<ByteBuffer> allocate) throws 
IOException {
     VectoredReadUtils.readVectored(this, ranges, allocate);
   }
+
+  /**
+   * Variant of {@link #readVectored(List, IntFunction)} where a release() 
function
+   * may be invoked if problems surface during reads -this method is called to
+   * try to return any allocated buffer which has not been read yet.
+   * Buffers which have successfully been read and returned to the caller do 
not
+   * get released: this is for failures only.
+   * <p>
+   * The default implementation calls readVectored/2 so as to ensure that
+   * if an existing stream implementation does not implement this method
+   * all is good.
+   * <p>
+   * Implementations SHOULD override this method if they can release buffers as
+   * part of their error handling.
+   * @param ranges the byte ranges to read
+   * @param allocate the function to allocate ByteBuffer

Review Comment:
   nit: remove "the"



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java:
##########
@@ -136,4 +137,30 @@ default void readVectored(List<? extends FileRange> ranges,
                             IntFunction<ByteBuffer> allocate) throws 
IOException {
     VectoredReadUtils.readVectored(this, ranges, allocate);
   }
+
+  /**
+   * Variant of {@link #readVectored(List, IntFunction)} where a release() 
function
+   * may be invoked if problems surface during reads -this method is called to
+   * try to return any allocated buffer which has not been read yet.
+   * Buffers which have successfully been read and returned to the caller do 
not
+   * get released: this is for failures only.
+   * <p>
+   * The default implementation calls readVectored/2 so as to ensure that
+   * if an existing stream implementation does not implement this method
+   * all is good.
+   * <p>
+   * Implementations SHOULD override this method if they can release buffers as
+   * part of their error handling.
+   * @param ranges the byte ranges to read
+   * @param allocate the function to allocate ByteBuffer
+   * @param release the function to release a ByteBuffer.

Review Comment:
   nit: remove "the"



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/VectorIOBufferPool.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.impl;
+
+import java.nio.ByteBuffer;
+import java.util.function.Consumer;
+import java.util.function.IntFunction;
+
+import org.apache.hadoop.io.ByteBufferPool;
+
+import static java.util.Objects.requireNonNull;
+
+/**
+ * A ByteBufferPool implementation that uses a pair of functions to allocate
+ * and release ByteBuffers; intended for use implementing the VectorIO API
+ * as it makes the pair of functions easier to pass around and use in
+ * existing code.
+ * <p>
+ * No matter what kind of buffer is requested, the allocation function
+ * is invoked; that is: the direct flag is ignored.
+ */
+public final class VectorIOBufferPool implements ByteBufferPool {

Review Comment:
   why do we need this when we have WeakReferencedElasticByteBufferPool ? An 
actual implementation of release function should call putBuffer() of the 
WeakReferencedElasticByteBufferPool no? 



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java:
##########
@@ -52,6 +53,15 @@ public final class VectoredReadUtils {
   private static final Logger LOG =
         LoggerFactory.getLogger(VectoredReadUtils.class);
 
+  /**
+   * This releaser just logs at debug that the buffer
+   * was released.
+   */
+  public static final Consumer<ByteBuffer> LOG_BYTE_BUFFER_RELEASED =
+      (buffer) -> {
+        LOG.debug("release buffer {}", buffer.toString());

Review Comment:
   should we parameterize and add. releasing buffer for range[x-y] ? 





> VectorIO API to support releasing buffers on failure
> ----------------------------------------------------
>
>                 Key: HADOOP-19303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> extend for vector IO API with a method that takes a ByteBufferPool 
> implementation rather than just an allocator. This allows for buffers to be 
> returned to the pool when problems occur, before throwing an exception.
> The Parquet API is already designed for this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19303) VectorIO API to support releasing buffers on failure

Reply via email to