On Sun, Nov 15, 2020 at 9:16 AM Andrew Purtell <[email protected]> wrote:
> I agree with Duo’s comment that a performance gain is unlikely but would > be orthogonal anyway; Perf observation is just an aside in the issue. Perf is orthogonal as you say above (as long as no regression). > it’s an availability gain that is the goal. We can assume it based on > theory of operation and unit test results but the gain should be tested and > measured on a cluster too. > The feature is about distributing load on hbase:meta to alleviate hotspotting; it makes read replicas more live so replicas are more likely to satisfy location lookups making read replicas more effective. That read replicas improve HA is presumed -- it was the original justification for this years old commit -- but HA is not the focus of this addition; hence no reports on effectiveness in this area. I have no problem working on such tests/reports but suggest that they are done post merge. > That said, the results of the testing thus far indicate no regression, > which gives me confidence to support a merge. Specifically, a merge to > “unblock” 2.4 (we aren’t really blocked, we are waiting), provided the > default there is the feature is configured off. But please indicate in > documentation and release notes that the feature is not widely tested yet - > as is customarily done for new functionality like this. > > No problem w/ flagging the feature as new. Thanks, S > > > On Nov 15, 2020, at 5:20 AM, 张铎 <[email protected]> wrote: > > > > Replied on jira, I think we missed an important scenario when testing. > > > > Thanks. > > > > Stack <[email protected]> 于2020年11月15日周日 上午2:30写道: > > > >> HBASE-18070 makes it so hbase:meta read replicas can run closer to the > >> primary, (< second lags rather than minutes). It adds Async WAL > >> Replication[1] on the hbase:meta table; i.e. edits are sprayed across > >> replicas as they arrive at the primary's WAL. Before this work, Async > WAL > >> Replication was only available on user-space tables and the only option > for > >> hbase:meta read replicas was reloading the primaries hfiles on a period > >> (minutes). HBASE-18070 also adds an optional client-side 'LoadBalance' > >> policy that favors read replicas ahead of primary reads falling back to > the > >> primary on fault. Together, these additions allow distributing > hbase:meta > >> read load across primary and replicas alleviating 'hotspotting'. > >> > >> I would like to merge the feature to master branch Monday evening if > there > >> is no objection (Soon after I'll merge to branch-2 so this feature can > >> hopefully be included in the upcoming 2.4.0RC). > >> > >> * For the design, see [2]. > >> * For an amalgamated PR of the 5 or 6 reviewed PRs that comprise this > >> feature, see [3]. > >> * For a PE report that compared performance before and after, see > >> HBASE-25127 (no regression). > >> * A report on ITBLL runs is pending to be attached to HBASE-18070 but > runs > >> so far show no regression with the feature enabled (ITBLL runs were done > >> against a backport of this feature to branch-2 as the ITBLL state of > master > >> is currently an unknown). > >> > >> Testing continues mainly looking for further improvement and to better > >> understand this feature in operation. Documentation is included but in > need > >> of polish (working on it). > >> > >> Dump any questions here and I'll be happy to respond. If you need more > time > >> to review, just shout. > >> > >> Thanks and thanks to all who contributed to this feature; the reviewers > and > >> the testers in particular. > >> > >> S > >> > >> 1. http://hbase.apache.org/book.html#_asnyc_wal_replication > >> 2. > >> > >> > https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit# > >> This patch is currently missing HBASE-25280, a bug found in testing. > >> 3. https://github.com/apache/hbase/pull/2643 > >> >
