[
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Beaudreault updated HBASE-26304:
--------------------------------------
Description:
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks
onto the correct host), the Reader's DFSInputStream and Region's localityIndex
metric must be refreshed. Without refreshing the DFSInputStream, the improved
locality will not improve latencies. In fact, the DFSInputStream may try to
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is
automatically retried, but the retry will temporarily increase long tail
latencies relative to configured backoff strategy.
In the original LocalityHealer design, I created a new
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list
of region names and, for each region store, re-opens the underlying StoreFile
if the locality has changed. This implementation was complicated both in
integrating callbacks into the HDFS Dispatcher and in terms of safely
re-opening StoreFiles without impacting reads or caches.
In working to port the LocalityHealer I'm taking a different approach:
* The part of the LocalityHealer that moves blocks will be an HDFS project
contribution
* As such, the DFSClient should be able to more gracefully recover from block
moves.
* Additionally, HBase has some caches of block locations for locality
reporting and the balancer. Those need to be kept up-to-date.
I will submit a PR with that implementation, but I am also investigating other
avenues. For example, I noticed
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but
maybe can be improved as an automatic lower-level handling of block moves.
was:
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks
onto the correct host), the Reader's DFSInputStream and Region's localityIndex
metric must be refreshed. Without refreshing the DFSInputStream, the improved
locality will not improve latencies. In fact, the DFSInputStream may try to
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is
automatically retried, but the retry will increase long tail latencies relative
to configured backoff strategy.
See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in
backoff strategy which can greatly mitigate latency impact of the missing block
retry.
Even with that mitigation, a StoreFile is often made up of many blocks. Without
some sort of intervention, we will continue to hit ReplicaNotFoundException
over time as clients naturally request data from moved blocks.
In the original LocalityHealer design, I created a new
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list
of region names and, for each region store, re-opens the underlying StoreFile
if the locality has changed.
I will submit a PR with that implementation, but I am also investigating other
avenues. For example, I noticed
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but
maybe can be improved as an automatic lower-level handling of block moves.
> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
> Key: HBASE-26304
> URL: https://issues.apache.org/jira/browse/HBASE-26304
> Project: HBase
> Issue Type: Sub-task
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving
> blocks onto the correct host), the Reader's DFSInputStream and Region's
> localityIndex metric must be refreshed. Without refreshing the
> DFSInputStream, the improved locality will not improve latencies. In fact,
> the DFSInputStream may try to fetch blocks that have moved, resulting in a
> ReplicaNotFoundException. This is automatically retried, but the retry will
> temporarily increase long tail latencies relative to configured backoff
> strategy.
>
> In the original LocalityHealer design, I created a new
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list
> of region names and, for each region store, re-opens the underlying StoreFile
> if the locality has changed. This implementation was complicated both in
> integrating callbacks into the HDFS Dispatcher and in terms of safely
> re-opening StoreFiles without impacting reads or caches.
> In working to port the LocalityHealer I'm taking a different approach:
> * The part of the LocalityHealer that moves blocks will be an HDFS project
> contribution
> * As such, the DFSClient should be able to more gracefully recover from
> block moves.
> * Additionally, HBase has some caches of block locations for locality
> reporting and the balancer. Those need to be kept up-to-date.
> I will submit a PR with that implementation, but I am also investigating
> other avenues. For example, I noticed
> https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but
> maybe can be improved as an automatic lower-level handling of block moves.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)