[
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sahil Takiar resolved IMPALA-9606.
----------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> ABFS reads should use hdfsPreadFully
> ------------------------------------
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't
> significantly improve performance. After some more investigation into the
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS
> reads.
> The ABFS client uses a different model for fetching data compared to S3A.
> Details are beyond the scope of this JIRA, but it is related to a feature in
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will
> be required by the client. By default, it pre-fetches # cores * 4 MB of data.
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS
> reads. It's definitely safer, because while the current implementation of
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}}
> API makes that guarantee.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]