[
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502334#comment-17502334
]
Hudson commented on HBASE-26304:
--------------------------------
Results for branch master
[build #528 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/General_20Nightly_20Build_20Report/]
(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/528/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
> Key: HBASE-26304
> URL: https://issues.apache.org/jira/browse/HBASE-26304
> Project: HBase
> Issue Type: Sub-task
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Edit: Description updated to avoid needing to read the full investigation
> laid out in the comments.
> Once the LocalityHealer has improved locality of a StoreFile (by moving
> blocks onto the correct host), the Reader's DFSInputStream and Region's
> localityIndex metric must be refreshed. Without refreshing the
> DFSInputStream, the improved locality will not improve latencies. In fact,
> the DFSInputStream may try to fetch blocks that have moved, resulting in a
> ReplicaNotFoundException. This is automatically retried, but the retry will
> temporarily increase long tail latencies relative to configured backoff
> strategy.
> In the original LocalityHealer design, I created a new
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list
> of region names and, for each region store, re-opens the underlying StoreFile
> if the locality has changed. This implementation was complicated both in
> integrating callbacks into the HDFS Dispatcher and in terms of safely
> re-opening StoreFiles without impacting reads or caches.
> In working to port the LocalityHealer to the Apache projects, I'm taking a
> different approach:
> * The part of the LocalityHealer that moves blocks will be an HDFS project
> contribution
> * As such, the DFSClient should be able to more gracefully recover from
> block moves.
> * Additionally, HBase has some caches of block locations for locality
> reporting and the balancer. Those need to be kept up-to-date.
> The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such,
> this issue becomes about updating HBase's block location caches.
> I considered a few different approaches, but the most elegant one I could
> come up with was to tie the HDFSBlockDistribution metrics directly to the
> underlying DFSInputStream of each StoreFile's initialReader. That way, our
> locality metrics are identically representing the block allocations that our
> reads are going through. This also means that our locality metrics will
> naturally adjust as the DFSInputStream adjusts to block moves.
> Once we have accurate locality metrics on the regionserver, the Balancer's
> cache can easily be invalidated via our usual heartbeat methods.
> RegionServers report to the HMaster periodically, which keeps a
> ClusterMetrics method up to date. Right before each balancer invocation, the
> balancer is updated with the latest ClusterMetrics. At this time, we compare
> the old ClusterMetrics to the new, and invalidate the caches for any regions
> whose locality has changed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)