bbeaudreault opened a new pull request #3704:
URL: https://github.com/apache/hbase/pull/3704


   This is just 1 (important but small) part of 
[HBASE-26250](https://issues.apache.org/jira/browse/HBASE-26250).
   I wanted to get this PR up for input, since there may be other approaches or 
implications to consider.
   
   I intentionally have not wired this up to the RPC layer yet. I don't think 
there's much harm in merging this to master except it might lock us into 
compatibility requirements if exposed on the RPC/Admin (which is the eventual 
intent). Alternatively, I could fully wire this up but only merge it into a 
feature branch.
   
   For convenience, here is the original description from the issue:
   
   > Once the LocalityHealer has improved locality of a StoreFile (by moving 
blocks onto the correct host), the Reader's DFSInputStream and Region's 
localityIndex metric must be refreshed. Without refreshing the DFSInputStream, 
the improved locality will not improve latencies. In fact, the DFSInputStream 
may try to fetch blocks that have moved, resulting in a 
ReplicaNotFoundException. This is automatically retried, but the retry will 
increase long tail latencies relative to configured backoff strategy.
   > 
   >See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in 
backoff strategy which can greatly mitigate latency impact of the missing block 
retry.
   >
   > Even with that mitigation, a StoreFile is often made up of many blocks. 
Without some sort of intervention, we will continue to hit 
ReplicaNotFoundException over time as clients naturally request data from moved 
blocks.
   >
   > In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed.
   >
   > I will submit a PR with that implementation, but I am also investigating 
other avenues. For example, I noticed 
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
maybe can be improved as an automatic lower-level handling of block moves.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to