[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests

Bryan Beaudreault (Jira) Thu, 21 Oct 2021 06:21:11 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432456#comment-17432456
 ]


Bryan Beaudreault commented on HBASE-26304:
-------------------------------------------

As mentioned above, I have implementations for the above 2 HDFS issues and it 
works great for ensuring HBase is able to take advantage of new locality 
improvements without any DFSClient warnings. Before pushing PRs for those, I'm 
now taking a look at the localityIndex reporting issue in case that affects the 
strategy. The core problem is that when a StoreFile is opened, a StoreFileInfo 
object is created. Initializing that StoreFileInfo calls 
computeHDFSBlocksDistribution and caches the result for the lifetime of the 
StoreFileInfo. The resulting value is available via the 
getHDFSBlockDistribution method.

The getHDFSBlockDistribution has three usages:
 * RatioBasedCompactionPolicy and DateTieredCompactionPolicy uses it to force a 
major compaction on files whose BlockLocalityIndex is less than a threshold
 * The value is aggregated for all StoreFiles in an HRegion, and used to create 
RegionLoad objects. RegionLoads are created in a few ways:
 ** On demand, when loading RegionServer UI "Regions" section
 ** On demand, through HBaseAdmin.getRegionLoad(ServerName, TableName)
 ** Periodically, in reporting heartbeat to HMaster, by default 3s. The HMaster 
uses these in a few ways:
 *** Available to query via HBaseAdmin
 *** Used in HMaster UI, where you can see localityIndex when viewing table page
 *** Used in various load balancer functions (though not localityIndex, since 
the balancer computes that separately)
 * The value is aggregated for all StoreFiles in an HRegion, and used to report 
localityIndex metrics.
 ** This happens in a thread which executes on an interval, by default 5s. The 
resulting metrics are available in JMX, hbtop, and the "Server Metrics" section 
at the top of RegionServer UIs.

All of these usages are non-time sensitive, i.e. not in a core read path or 
anything. As such I think we could consider the StoreFileInfo 
hdfsBlockDistribution a cache which must be cleared. Previously it was a cache 
of a value that rarely changed, and now we need more control over clearing. I 
can think of 3 options for this:
 * We could create a periodic chore which reloads the cached value for all 
store files. This could be filtered to only clear values which are not fully 
local.
 * We could add a TTL on the cached value, which gets enforced at read time. In 
other words, when getHDFSBlockDistribution is called, re-compute if TTL is 
expired. We could similarly limit this to only files which are not fully local.
 * We could use some trigger from the DFSInputStream to intelligently refresh 
the HDFSBlockDistribution only if the underlying stream has been updated. I 
think this would have to happen at the HStoreFile level, which has a similar 
getHDFSBlockDistribution which is the only caller to the StorefileInfo method. 
The HStoreFile has access to the initialReader object which can access the 
underlying FSDataInputStreamWrapper. We'd need to expose something in 
DFSInputStream that can be used to trigger the logic.

Of the options, I think the last one is most appealing because we could avoid 
yet another config (the refresh ttl/period). That one also is the most involved 
and requires some investigation. My second preference would be the 2nd option 
above, because I'd like to avoid another chore. I don't think the minor latency 
hit of fetching block locations should be an issue for any of the use cases 
mentioned above.

I'm going to do a little more investigation into what the 3rd option could look 
like.

> Reflect out-of-band locality improvements in served requests
> ------------------------------------------------------------
>
>                 Key: HBASE-26304
>                 URL: https://issues.apache.org/jira/browse/HBASE-26304
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch blocks that have moved, resulting in a 
> ReplicaNotFoundException. This is automatically retried, but the retry will 
> increase long tail latencies relative to configured backoff strategy.
> See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in 
> backoff strategy which can greatly mitigate latency impact of the missing 
> block retry.
> Even with that mitigation, a StoreFile is often made up of many blocks. 
> Without some sort of intervention, we will continue to hit 
> ReplicaNotFoundException over time as clients naturally request data from 
> moved blocks.
> In the original LocalityHealer design, I created a new 
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
> of region names and, for each region store, re-opens the underlying StoreFile 
> if the locality has changed.
> I will submit a PR with that implementation, but I am also investigating 
> other avenues. For example, I noticed 
> https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
> maybe can be improved as an automatic lower-level handling of block moves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-26304) Reflect out-of-band locality improvements in served requests

Reply via email to