[ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425868#comment-17425868
 ] 

Bryan Beaudreault commented on HBASE-26304:
-------------------------------------------

I have a proof of concept working with the above 2 HDFS issues in a test 
cluster. Works great, though as mentioned above I still need to figure out how 
to update localityIndex, aka how to trigger computeHdfsBlockDistribution in 
StoreFileInfo

> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
>                 Key: HBASE-26304
>                 URL: https://issues.apache.org/jira/browse/HBASE-26304
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch blocks that have moved, resulting in a 
> ReplicaNotFoundException. This is automatically retried, but the retry will 
> increase long tail latencies relative to configured backoff strategy.
> See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in 
> backoff strategy which can greatly mitigate latency impact of the missing 
> block retry.
> Even with that mitigation, a StoreFile is often made up of many blocks. 
> Without some sort of intervention, we will continue to hit 
> ReplicaNotFoundException over time as clients naturally request data from 
> moved blocks.
> In the original LocalityHealer design, I created a new 
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
> of region names and, for each region store, re-opens the underlying StoreFile 
> if the locality has changed.
> I will submit a PR with that implementation, but I am also investigating 
> other avenues. For example, I noticed 
> https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
> maybe can be improved as an automatic lower-level handling of block moves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to