[
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425868#comment-17425868
]
Bryan Beaudreault commented on HBASE-26304:
-------------------------------------------
I have a proof of concept working with the above 2 HDFS issues in a test
cluster. Works great, though as mentioned above I still need to figure out how
to update localityIndex, aka how to trigger computeHdfsBlockDistribution in
StoreFileInfo
> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
> Key: HBASE-26304
> URL: https://issues.apache.org/jira/browse/HBASE-26304
> Project: HBase
> Issue Type: Sub-task
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving
> blocks onto the correct host), the Reader's DFSInputStream and Region's
> localityIndex metric must be refreshed. Without refreshing the
> DFSInputStream, the improved locality will not improve latencies. In fact,
> the DFSInputStream may try to fetch blocks that have moved, resulting in a
> ReplicaNotFoundException. This is automatically retried, but the retry will
> increase long tail latencies relative to configured backoff strategy.
> See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in
> backoff strategy which can greatly mitigate latency impact of the missing
> block retry.
> Even with that mitigation, a StoreFile is often made up of many blocks.
> Without some sort of intervention, we will continue to hit
> ReplicaNotFoundException over time as clients naturally request data from
> moved blocks.
> In the original LocalityHealer design, I created a new
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list
> of region names and, for each region store, re-opens the underlying StoreFile
> if the locality has changed.
> I will submit a PR with that implementation, but I am also investigating
> other avenues. For example, I noticed
> https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but
> maybe can be improved as an automatic lower-level handling of block moves.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)