[
https://issues.apache.org/jira/browse/HADOOP-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929939#comment-17929939
]
ASF GitHub Bot commented on HADOOP-19303:
-----------------------------------------
cnauroth commented on PR #7418:
URL: https://github.com/apache/hadoop/pull/7418#issuecomment-2680008490
> > @cnauroth @anujmodi2021 have either of you two implemented the vector
read API yet?
> > I ask as this PR currently maps the readVectored/3 call to the
readVectored/2 call unless overridden, so the default implementation will leak
buffers on failure, even if a release function is passed in.
> > If I change it to passing the release call down, then any input stream
which implemented readVectored/2 will not have the readVectored/3 call invoking
it, unless they override that explicitly too. In this PR, everything in hadoop
common does, and I will in S3AInputStream.
> > I'm just trying to work out the best design for other streams. IF all
the implementation are in the hadoop source tree, I can do the overrides there
and have a default which does release buffers everywhere else.
> >
> > * @mukund-thakur @ahmarsuhail @saikatroy038 @shameersss1
>
> Hi @steveloughran, I am working on the vectored read API feature from the
ABFS driver team. We are still working on the design part of the feature and
will pick up the implementation soon.
Hello @steveloughran !
GCS has an implementation of vectored read, overriding readVectored/2 here:
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFSInputStream.java#L176
Implementation details here:
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/VectoredIOImpl.java
This is on the master branch and 3.0 release line, which is not yet in
mainstream Dataproc use. We don't have vectored read in version 2.2 or earlier.
It sounds like once this change is in a Hadoop release, GCS should plan on
picking this up and overriding readVectored/3. Do I have it right?
CC: @arunkumarchacko
> VectorIO API to support releasing buffers on failure
> ----------------------------------------------------
>
> Key: HADOOP-19303
> URL: https://issues.apache.org/jira/browse/HADOOP-19303
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/s3
> Affects Versions: 3.4.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> extend for vector IO API with a method that takes a ByteBufferPool
> implementation rather than just an allocator. This allows for buffers to be
> returned to the pool when problems occur, before throwing an exception.
> The Parquet API is already designed for this
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]