[jira] [Commented] (HADOOP-19303) VectorIO API to support releasing buffers on failure

ASF GitHub Bot (Jira) Mon, 24 Feb 2025 16:20:21 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929939#comment-17929939
 ]


ASF GitHub Bot commented on HADOOP-19303:
-----------------------------------------

cnauroth commented on PR #7418:
URL: https://github.com/apache/hadoop/pull/7418#issuecomment-2680008490

   > > @cnauroth @anujmodi2021 have either of you two implemented the vector 
read API yet?
   > > I ask as this PR currently maps the readVectored/3 call to the 
readVectored/2 call unless overridden, so the default implementation will leak 
buffers on failure, even if a release function is passed in.
   > > If I change it to passing the release call down, then any input stream 
which implemented readVectored/2 will not have the readVectored/3 call invoking 
it, unless they override that explicitly too. In this PR, everything in hadoop 
common does, and I will in S3AInputStream.
   > > I'm just trying to work out the best design for other streams. IF all 
the implementation are in the hadoop source tree, I can do the overrides there 
and have a default which does release buffers everywhere else.
   > > 
   > > * @mukund-thakur @ahmarsuhail @saikatroy038 @shameersss1
   > 
   > Hi @steveloughran, I am working on the vectored read API feature from the 
ABFS driver team. We are still working on the design part of the feature and 
will pick up the implementation soon.
   
   Hello @steveloughran !
   
   GCS has an implementation of vectored read, overriding readVectored/2 here:
   
   
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFSInputStream.java#L176
   
   Implementation details here:
   
   
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/VectoredIOImpl.java
   
   This is on the master branch and 3.0 release line, which is not yet in 
mainstream Dataproc use. We don't have vectored read in version 2.2 or earlier.
   
   It sounds like once this change is in a Hadoop release, GCS should plan on 
picking this up and overriding readVectored/3. Do I have it right?
   
   CC: @arunkumarchacko
   




> VectorIO API to support releasing buffers on failure
> ----------------------------------------------------
>
>                 Key: HADOOP-19303
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19303
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>
> extend for vector IO API with a method that takes a ByteBufferPool 
> implementation rather than just an allocator. This allows for buffers to be 
> returned to the pool when problems occur, before throwing an exception.
> The Parquet API is already designed for this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19303) VectorIO API to support releasing buffers on failure

Reply via email to