[
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824166#comment-17824166
]
Steve Loughran commented on HADOOP-19101:
-----------------------------------------
* tests didn't validate the default impl, just native/local FS and s3a, all of
which get it right.
* I'd added the abfs contract tests and things blew up or hing (see the PR
there), but before looking at those I was trying to follow the VectorReadUtils
stuff and make sense of what was passed down and concluded that either I didn't
understand it *or* the code was broken. After a while I concluded it had to be
#2
> Vectored Read into off-heap buffer broken in fallback implementation
> --------------------------------------------------------------------
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/azure
> Affects Versions: 3.4.0, 3.3.6
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Blocker
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at
> position zero even when the range is at a different offset. As a result: you
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we
> have never seen this in production because the parquet and ORC libraries both
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to
> read into off-heap DirectBuffers. This is a bit trickier than you would think
> because an allocator is passed in. For PARQUET-2171 we will
> * only invoke the API on streams which explicitly declare their support for
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]