[ 
https://issues.apache.org/jira/browse/HBASE-27896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-27896:
--------------------------------------
    Description: 
In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced 
{{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead is 
enabled. This flag is used for creating the default store reader (i.e. the one 
used by PREAD reads). Stream readers don't use this flag, instead they always 
pass -1.

When that flag is true, we pass a readahead value of 0 to 
FSDataInputStream.setReadahead. When the flag is false, we pass -1 which 
triggers hdfs default behavior. The default behavior is to use a readahead of 
4MB.

It seems to me that we don't want readahead for PREAD reads, and especially not 
such a large readahead. Our default block size is 64kb, which is much smaller 
than that. A PREAD read is not likely to do sequential IO, so not likely to 
utilize the cached readahead buffer.

I set no-readahead to true in a few of our clusters and in each case saw a 
massive reduction in disk IO and thus increase in throughput. I load tested 
this in a test cluster which does fully random reads of ~300 byte rows on a 
dataset which is 20x larger than memory. The load test was able to achieve 
nearly double the throughput.

As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb 
seems way too big for many common workloads.

  was:
In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced 
{{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead is 
enabled. This flag is used for creating the default store reader (i.e. the one 
used by PREAD reads). Stream readers don't use this flag, instead they always 
pass -1.

When that flag is false, we pass a readahead value of 0 to 
FSDataInputStream.setReadahead. When the flag is true, we pass -1 which 
triggers hdfs default behavior. The default behavior is to use a readahead of 
4MB.

It seems to me that we don't want readahead for PREAD reads, and especially not 
such a large readahead. Our default block size is 64kb, which is much smaller 
than that. A PREAD read is not likely to do sequential IO, so not likely to 
utilize the cached readahead buffer.

I set no-readahead to true in a few of our clusters and in each case saw a 
massive reduction in disk IO and thus increase in throughput. I load tested 
this in a test cluster which does fully random reads of ~300 byte rows on a 
dataset which is 20x larger than memory. The load test was able to achieve 
nearly double the throughput.

As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb 
seems way too big for many common workloads.


> Disable hdfs readahead for pread reads
> --------------------------------------
>
>                 Key: HBASE-27896
>                 URL: https://issues.apache.org/jira/browse/HBASE-27896
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced 
> {{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead 
> is enabled. This flag is used for creating the default store reader (i.e. the 
> one used by PREAD reads). Stream readers don't use this flag, instead they 
> always pass -1.
> When that flag is true, we pass a readahead value of 0 to 
> FSDataInputStream.setReadahead. When the flag is false, we pass -1 which 
> triggers hdfs default behavior. The default behavior is to use a readahead of 
> 4MB.
> It seems to me that we don't want readahead for PREAD reads, and especially 
> not such a large readahead. Our default block size is 64kb, which is much 
> smaller than that. A PREAD read is not likely to do sequential IO, so not 
> likely to utilize the cached readahead buffer.
> I set no-readahead to true in a few of our clusters and in each case saw a 
> massive reduction in disk IO and thus increase in throughput. I load tested 
> this in a test cluster which does fully random reads of ~300 byte rows on a 
> dataset which is 20x larger than memory. The load test was able to achieve 
> nearly double the throughput.
> As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb 
> seems way too big for many common workloads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to