Anyway I don’t wish to discuss this veto at length. What’s done is done. The merge is off for the 2.4 RC. It will proceed without this feature.
> On Nov 16, 2020, at 7:33 AM, Andrew Purtell <[email protected]> wrote: > > Your -1 throws a wrench into an agreement we had to merge this for 2.4. The > 2.4 RC has been waiting for two weeks for this reason. It is a rather > unfriendly act and frustrating to me as 2.4 RC. I feel this needs to be said. > Despite what you claim the reason for hedged reads has been explained here > and on the JIRA. > >> On Nov 15, 2020, at 11:20 PM, 张铎 <[email protected]> wrote: >> >> So what is your purpose of distributing the request of region location >> lookup? It is just because you want to 'distribute the request of region >> location lookup'? >> >> Then I'm -1 on merging. We should reach an agreement on what we want to >> solve before merging at least. >> >> I've helped this issue from the design doc step. For me, the purpose for >> this issue is clear. We want to prevent the hotspot of meta, so the >> solution is simple, enable meta replica, and then just modify the client to >> not always go to primary replica first(this is the old behavior even with >> meta replica feature on). >> And this will introduce another problem that, there is no meta region >> replication implementation for meta read replicas, which means the latency >> will be large as we can only sync the data between replicas through region >> flush, so we implement meta region replication. >> >> So I think it is very important to verify that we have truly distributed >> the request of region location lookup, and also make sure that we could >> support more requests of region location lookup. Otherwise this feature is >> useless. >> >> And I agree with Andrew that, since the feature is default off on branch-2 >> and has no regression, it is OK to merge for now. Theoretically our >> approach here should work, so even it does not work for now, I think we >> could fix the problems to make it work. >> >> But your reply above made me wonder whether we are talking about the same >> thing. That's why I'm -1 here. I'm not going to force you to do the test >> suggested by me, as I said it could be done after merging, just want to >> reach an agreement on the goal of this feature. >> >> Thanks. >> >> Stack <[email protected]> 于2020年11月16日周一 下午12:35写道: >> >>>> On Sun, Nov 15, 2020 at 9:16 AM Andrew Purtell <[email protected]> >>>> wrote: >>>> >>>> I agree with Duo’s comment that a performance gain is unlikely but would >>>> be orthogonal anyway; >>> >>> >>> Perf observation is just an aside in the issue. Perf is orthogonal as you >>> say above (as long as no regression). >>> >>> >>> >>>> it’s an availability gain that is the goal. We can assume it based on >>>> theory of operation and unit test results but the gain should be tested >>> and >>>> measured on a cluster too. >>>> >>> >>> >>> The feature is about distributing load on hbase:meta to alleviate >>> hotspotting; it makes read replicas more live so replicas are more likely >>> to satisfy location lookups making read replicas more effective. That read >>> replicas improve HA is presumed -- it was the original justification for >>> this years old commit -- but HA is not the focus of this addition; hence no >>> reports on effectiveness in this area. >>> >>> I have no problem working on such tests/reports but suggest that they are >>> done post merge. >>> >>> >>> >>>> That said, the results of the testing thus far indicate no regression, >>>> which gives me confidence to support a merge. Specifically, a merge to >>>> “unblock” 2.4 (we aren’t really blocked, we are waiting), provided the >>>> default there is the feature is configured off. But please indicate in >>>> documentation and release notes that the feature is not widely tested >>> yet - >>>> as is customarily done for new functionality like this. >>>> >>>> >>> No problem w/ flagging the feature as new. >>> >>> Thanks, >>> S >>> >>> >>> >>>> >>>>>> On Nov 15, 2020, at 5:20 AM, 张铎 <[email protected]> wrote: >>>>>> >>>>>> Replied on jira, I think we missed an important scenario when testing. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Stack <[email protected]> 于2020年11月15日周日 上午2:30写道: >>>>>> >>>>>>> HBASE-18070 makes it so hbase:meta read replicas can run closer to the >>>>>>> primary, (< second lags rather than minutes). It adds Async WAL >>>>>>> Replication[1] on the hbase:meta table; i.e. edits are sprayed across >>>>>>> replicas as they arrive at the primary's WAL. Before this work, Async >>>>> WAL >>>>>>> Replication was only available on user-space tables and the only >>> option >>>> for >>>>>> hbase:meta read replicas was reloading the primaries hfiles on a >>> period >>>>>> (minutes). HBASE-18070 also adds an optional client-side 'LoadBalance' >>>>>> policy that favors read replicas ahead of primary reads falling back >>> to >>>> the >>>>>> primary on fault. Together, these additions allow distributing >>>> hbase:meta >>>>>> read load across primary and replicas alleviating 'hotspotting'. >>>>>> >>>>>> I would like to merge the feature to master branch Monday evening if >>>> there >>>>>> is no objection (Soon after I'll merge to branch-2 so this feature can >>>>>> hopefully be included in the upcoming 2.4.0RC). >>>>>> >>>>>> * For the design, see [2]. >>>>>> * For an amalgamated PR of the 5 or 6 reviewed PRs that comprise this >>>>>> feature, see [3]. >>>>>> * For a PE report that compared performance before and after, see >>>>>> HBASE-25127 (no regression). >>>>>> * A report on ITBLL runs is pending to be attached to HBASE-18070 but >>>> runs >>>>>> so far show no regression with the feature enabled (ITBLL runs were >>> done >>>>>> against a backport of this feature to branch-2 as the ITBLL state of >>>> master >>>>>> is currently an unknown). >>>>>> >>>>>> Testing continues mainly looking for further improvement and to better >>>>>> understand this feature in operation. Documentation is included but in >>>> need >>>>>> of polish (working on it). >>>>>> >>>>>> Dump any questions here and I'll be happy to respond. If you need more >>>> time >>>>>> to review, just shout. >>>>>> >>>>>> Thanks and thanks to all who contributed to this feature; the >>> reviewers >>>> and >>>>>> the testers in particular. >>>>>> >>>>>> S >>>>>> >>>>>> 1. http://hbase.apache.org/book.html#_asnyc_wal_replication >>>>>> 2. >>>>>> >>>>>> >>>> >>> https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit# >>>>>> This patch is currently missing HBASE-25280, a bug found in testing. >>>>>> 3. https://github.com/apache/hbase/pull/2643 >>>>>> >>>> >>>
