[ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502334#comment-17502334
 ] 

Hudson commented on HBASE-26304:
--------------------------------

Results for branch master
        [build #528 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/General_20Nightly_20Build_20Report/]








(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
>                 Key: HBASE-26304
>                 URL: https://issues.apache.org/jira/browse/HBASE-26304
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Edit: Description updated to avoid needing to read the full investigation 
> laid out in the comments.
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch blocks that have moved, resulting in a 
> ReplicaNotFoundException. This is automatically retried, but the retry will 
> temporarily increase long tail latencies relative to configured backoff 
> strategy.
> In the original LocalityHealer design, I created a new 
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
> of region names and, for each region store, re-opens the underlying StoreFile 
> if the locality has changed. This implementation was complicated both in 
> integrating callbacks into the HDFS Dispatcher and in terms of safely 
> re-opening StoreFiles without impacting reads or caches. 
> In working to port the LocalityHealer to the Apache projects, I'm taking a 
> different approach:
>  * The part of the LocalityHealer that moves blocks will be an HDFS project 
> contribution
>  * As such, the DFSClient should be able to more gracefully recover from 
> block moves.
>  * Additionally, HBase has some caches of block locations for locality 
> reporting and the balancer. Those need to be kept up-to-date.
> The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, 
> this issue becomes about updating HBase's block location caches.
> I considered a few different approaches, but the most elegant one I could 
> come up with was to tie the HDFSBlockDistribution metrics directly to the 
> underlying DFSInputStream of each StoreFile's initialReader. That way, our 
> locality metrics are identically representing the block allocations that our 
> reads are going through. This also means that our locality metrics will 
> naturally adjust as the DFSInputStream adjusts to block moves.
> Once we have accurate locality metrics on the regionserver, the Balancer's 
> cache can easily be invalidated via our usual heartbeat methods. 
> RegionServers report to the HMaster periodically, which keeps a 
> ClusterMetrics method up to date. Right before each balancer invocation, the 
> balancer is updated with the latest ClusterMetrics. At this time, we compare 
> the old ClusterMetrics to the new, and invalidate the caches for any regions 
> whose locality has changed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to