[ 
https://issues.apache.org/jira/browse/HADOOP-19229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19229:
------------------------------------
    Description: 
vector iO has a max size to coalesce ranges, but it also needs a maximum gap 
between ranges to justify the merge. Right now we could have a read where two 
vectors of size 8 bytes can be merged with a 1 MB gap between them -and that's 
wasteful. 

We could also consider an "efficiency" metric which looks at the ratio of 
bytes-read to bytes-discarded. Not sure what we'd do with it, but we could 
track it as an IOStat

h2. Current values

The thresholds at which adjacent vector IO read ranges are coalesced into a
single range has been increased, as has the limit at which point they are 
considered large enough that parallel reads are faster.

* The min/max for local filesystems and any other FS without custom support are 
now 16K and 1M
* s3a and abfs use 128K as the minimum size, 2M for max.




  was:
vector iO has a max size to coalesce ranges, but it also needs a maximum gap 
between ranges to justify the merge. Right now we could have a read where two 
vectors of size 8 bytes can be merged with a 1 MB gap between them -and that's 
wasteful. 

We could also consider an "efficiency" metric which looks at the ratio of 
bytes-read to bytes-discarded. Not sure what we'd do with it, but we could 
track it as an IOStat


> Vector IO on cloud storage: what is a good minimum seek size?
> -------------------------------------------------------------
>
>                 Key: HADOOP-19229
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19229
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.2
>
>
> vector iO has a max size to coalesce ranges, but it also needs a maximum gap 
> between ranges to justify the merge. Right now we could have a read where two 
> vectors of size 8 bytes can be merged with a 1 MB gap between them -and 
> that's wasteful. 
> We could also consider an "efficiency" metric which looks at the ratio of 
> bytes-read to bytes-discarded. Not sure what we'd do with it, but we could 
> track it as an IOStat
> h2. Current values
> The thresholds at which adjacent vector IO read ranges are coalesced into a
> single range has been increased, as has the limit at which point they are 
> considered large enough that parallel reads are faster.
> * The min/max for local filesystems and any other FS without custom support 
> are 
> now 16K and 1M
> * s3a and abfs use 128K as the minimum size, 2M for max.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to