Thanks everyone! I'll get going on making this change in the near future. On Fri, Jun 2, 2023 at 9:37 AM Nick Dimiduk <ndimi...@apache.org> wrote:
> Hi Bryan, > > Based on your observations, I'm in favor of changing the default for new > minor releases going forward, and maybe some comments about this in the > online book. I'm also in favor of exposing configurable readahead for > STREAM reads, enabling easier experimentation for those keen to do so. > > Thanks, > Nick > > On Fri, Jun 2, 2023 at 9:42 AM Xiaolin Ha <summer.he...@gmail.com> wrote: > > > +1 for disable readahead for pread > > > > Bryan Beaudreault <bbeaudrea...@apache.org> 于2023年5月31日周三 20:44写道: > > > > > Hello team, > > > > > > I recently discovered "hbase.store.reader.no-readahead", which defaults > > to > > > false (so readahead is enabled). This only applies to PREAD reads, not > > > STREAM reads which always use readahead. When readahead is enabled, the > > > default readahead amount in the DFSClient is 4mb. In my opinion this is > > > extremely huge for HBase's use-case. > > > > > > Further, reads in HBase are always for a block at a time and blocks > > > typically have more than one row in them. So we are already reading > > ahead a > > > bit via block reads. And lastly, readahead is typically useful for > > > sequential read scenarios. It's unlikely for someone to do sequential > IO > > > via PREAD, instead they would use Scans (thus STREAM). In the case > where > > > someone is doing sequential IO via PREAD, they'd get some natural > > readahead > > > due to our reading of blocks at a time. > > > > > > I disabled readahead on about 50 servers across various clusters in our > > > production environment, and saw a massive (10x or more) drop in disk IO > > for > > > random read and mixed read cases. Scan workloads were mostly unaffected > > due > > > to not using this setting. I also did a targeted load test of a > cluster, > > > with and without readahead, and was able to get double the random read > > > throughput with it disabled. > > > > > > I'd like to update the default for this config to "true", thus > disabling > > > readahead for PREAD by default. I also think it's worth investigating > > > making readahead configurable for STREAM reads, perhaps based on the > > scan's > > > max result size or blockBytesScanned of the last next() call. > > > > > > Any objections to changing the default? > > > > > > https://issues.apache.org/jira/browse/HBASE-27896 > > > > > >