[
https://issues.apache.org/jira/browse/HADOOP-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mukund Thakur updated HADOOP-17250:
-----------------------------------
Description:
Random read if marginally read ahead was seen to improve perf for a TPCH query.
Introducing fs.azure.readahead.range parameter which can be set by user.
Data will be populated in buffer for random reads as well which leads to lesser
remote calls.
This patch also changes the seek implementation to perform a lazy seek. Actual
seek is done when a read is initiated and data is not present in buffer else
date is returned from buffer thus reducing the number of remote calls.
was:
Random read if marginally read ahead was seen to improve perf for a TPCH query.
> ABFS: Random read perf improvement
> ----------------------------------
>
> Key: HADOOP-17250
> URL: https://issues.apache.org/jira/browse/HADOOP-17250
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.0
> Reporter: Sneha Vijayarajan
> Assignee: Mukund Thakur
> Priority: Major
> Labels: abfsactive, pull-request-available
> Fix For: 3.3.2
>
> Time Spent: 5h 10m
> Remaining Estimate: 0h
>
> Random read if marginally read ahead was seen to improve perf for a TPCH
> query.
>
> Introducing fs.azure.readahead.range parameter which can be set by user.
> Data will be populated in buffer for random reads as well which leads to
> lesser
> remote calls.
> This patch also changes the seek implementation to perform a lazy seek. Actual
> seek is done when a read is initiated and data is not present in buffer else
> date is returned from buffer thus reducing the number of remote calls.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]