[
https://issues.apache.org/jira/browse/HBASE-27896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault updated HBASE-27896:
--------------------------------------
Description:
In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced
{{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead is
enabled. This flag is used for creating the default store reader (i.e. the one
used by PREAD reads). Stream readers don't use this flag, instead they always
pass -1.
When that flag is true, we pass a readahead value of 0 to
FSDataInputStream.setReadahead. When the flag is false, we pass -1 which
triggers hdfs default behavior. The default behavior is to use a readahead of
4MB.
It seems to me that we don't want readahead for PREAD reads, and especially not
such a large readahead. Our default block size is 64kb, which is much smaller
than that. A PREAD read is not likely to do sequential IO, so not likely to
utilize the cached readahead buffer.
I set no-readahead to true in a few of our clusters and in each case saw a
massive reduction in disk IO and thus increase in throughput. I load tested
this in a test cluster which does fully random reads of ~300 byte rows on a
dataset which is 20x larger than memory. The load test was able to achieve
nearly double the throughput.
As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb
seems way too big for many common workloads.
was:
In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced
{{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead is
enabled. This flag is used for creating the default store reader (i.e. the one
used by PREAD reads). Stream readers don't use this flag, instead they always
pass -1.
When that flag is false, we pass a readahead value of 0 to
FSDataInputStream.setReadahead. When the flag is true, we pass -1 which
triggers hdfs default behavior. The default behavior is to use a readahead of
4MB.
It seems to me that we don't want readahead for PREAD reads, and especially not
such a large readahead. Our default block size is 64kb, which is much smaller
than that. A PREAD read is not likely to do sequential IO, so not likely to
utilize the cached readahead buffer.
I set no-readahead to true in a few of our clusters and in each case saw a
massive reduction in disk IO and thus increase in throughput. I load tested
this in a test cluster which does fully random reads of ~300 byte rows on a
dataset which is 20x larger than memory. The load test was able to achieve
nearly double the throughput.
As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb
seems way too big for many common workloads.
> Disable hdfs readahead for pread reads
> --------------------------------------
>
> Key: HBASE-27896
> URL: https://issues.apache.org/jira/browse/HBASE-27896
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Priority: Major
>
> In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced
> {{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead
> is enabled. This flag is used for creating the default store reader (i.e. the
> one used by PREAD reads). Stream readers don't use this flag, instead they
> always pass -1.
> When that flag is true, we pass a readahead value of 0 to
> FSDataInputStream.setReadahead. When the flag is false, we pass -1 which
> triggers hdfs default behavior. The default behavior is to use a readahead of
> 4MB.
> It seems to me that we don't want readahead for PREAD reads, and especially
> not such a large readahead. Our default block size is 64kb, which is much
> smaller than that. A PREAD read is not likely to do sequential IO, so not
> likely to utilize the cached readahead buffer.
> I set no-readahead to true in a few of our clusters and in each case saw a
> massive reduction in disk IO and thus increase in throughput. I load tested
> this in a test cluster which does fully random reads of ~300 byte rows on a
> dataset which is 20x larger than memory. The load test was able to achieve
> nearly double the throughput.
> As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb
> seems way too big for many common workloads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)