Re: [VOTE] Merge branch HBASE-19397 back to master and branch-2.

Josh Elser Tue, 09 Jan 2018 06:51:56 -0800

+1 on merge to Master.

I appreciate the details you shared, Duo, but I think I'm still -0 on abranch-2 merge at this point. I'm with Stack: would rather pull it intoa fast 2.1 release.


On 1/8/18 8:22 PM, 张铎(Duo Zhang) wrote:

Anyway, if no objections on merging this into master, let's do it? So that
we can start working on the follow-on features, such as table based
replication storage, and synchronous replication, etc.

Thanks.

2018-01-09 7:19 GMT+08:00 Stack <[email protected]>:

On Mon, Jan 8, 2018 at 2:50 PM, 张铎(Duo Zhang) <[email protected]>
wrote:

This 'new' feature only changes DDL part, not the core part of

replication,

i.e, how to read wal entries and how to replicate it to the remote

cluster,

etc. And also there is no pb message/storage layout change, you can think
of this as a big refactoring. Theoretically we even do not need to add

new

UTs for this feature, i.e, no extra stability works. The only visible
change to users is that it may require more time on modifying peers in
shell. So in general I think it is less hurt to include it in the coming
release?

And why I think it SHOULD be included in our 2.0 release is that, the
synchronous guarantee is really a good thing for our replication related
UTs. The correctness of the current Test***Replication usually depends

on a

flakey condition - we will not do a log rolling between the modification

on

zk and the actual loading of the modification on RS. And we have also

done

a hard work to cleanup the lockings in ReplicationSourceManager and add a
fat comment to say why it should be synchronized in this way. In general,
the new code is much easier to read, test and debug, and also reduce the
possibility of flakeyness, which could help us a lot when we start to
stabilize our build.

Thanks.

You see it as a big bug fix Duo?

Kind of. Just like the AM v1, we can do lots of fix to make it more stable,
but we know that we can not fix all the problems since we store state in
several places and it is a 'mission impossible' to make all the states stay
in sync under every situation... So we introduce AM v2.
For the replication peer tracking, it is the same problem. It is hard to do
fencing with zk watcher since it is asynchronous, so the UTs are always
kind of flakey in theoretical. And we depend on replication heavily at
Xiaomi, it is always a pain for us.


I'm late to review. Will have a look after beta-1 goes out. This stuff
looks great from outside, especially distributed procedure framework which
we need all over the place.

In general I have no problem w/ this in master and an hbase 2.1 (2.1 could
be soon after 2.0). Its late for big stuff in 2.0.... but let me take a
looksee sir.

Thanks sir. All the concerns here about whether we should merge this into
2.0 are reasonable, I know. Although I really want this in 2.0 because I
believe it will help a lot for stabilizing,  I'm OK with merge it to 2.1
only if you guys all think so.


St.Ack

2018-01-09 4:53 GMT+08:00 Apekshit Sharma <[email protected]>:

Same questions as Josh's.
1) We have RCs for beta1 now, which means only commits that can go in

are

bug fixes only. This change - although important, needed from long time

and

well done (testing, summary, etc) - seems rather very large to get into

2.0

now. Needs good justification why it has to be 2.1 instead of 2.0.

-- Appy


On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <[email protected]> wrote:

-0 From a general project planning point-of-view (not based on the
technical merit of the code) I am uncomfortable about pulling in a

brand

new feature after we've already made one beta RC.

Duo -- can you expand on why this feature is so important that we

should

break our release plan? Are there problems that would make including

this

in a 2.1/3.0 unnecessarily difficult? Any kind of color you can

provide

on

"why does this need to go into 2.0?" would be helpful.


On 1/6/18 1:54 AM, Duo Zhang wrote:

https://issues.apache.org/jira/browse/HBASE-19397

We aim to move the peer modification framework from zk watcher to
procedure
v2 in this issue and the work is done now.

Copy the release note here:

Introduce 5 procedures to do peer modifications:

AddPeerProcedure
RemovePeerProcedure
UpdatePeerConfigProcedure
EnablePeerProcedure
DisablePeerProcedure

The procedures are all executed with the following stage:
1. Call pre CP hook, if an exception is thrown then give up
2. Check whether the operation is valid, if not then give up
3. Update peer storage. Notice that if we have entered this stage,

then

we
can not rollback any more.
4. Schedule sub procedures to refresh the peer config on every RS.
5. Do post cleanup if any.
6. Call post CP hook. The exception thrown will be ignored since we

have

already done the work.

The procedure will hold an exclusive lock on the peer id, so now

there

is

no concurrent modifications on a single peer.

And now it is guaranteed that once the procedure is done, the peer
modification has already taken effect on all RSes.

Abstracte a storage layer for replication peer/queue manangement,

and

refactored the upper layer to remove zk related

naming/code/comment.


Add pre/postExecuteProcedures CP hooks to RegionServerObserver, and

add

permission check for executeProcedures method which requires the

caller

to
be system user or super user.

On rolling upgrade: just do not do any replication peer

modifications

during the rolling upgrading. There is no pb/layout changes on the
peer/queue storage on zk.

And there are other benefits.
First, we have introduced a general procedure framework to send

tasks

to

RS
and report the report back to Master. It can be used to implement

other

operations such as ACL change.
Second, zk is used as a external storage now since we do not depend

on

zk

watcher any more, it will be much easier to implement a 'table

based'

replication peer/queue storage.

Please vote:
[+1] Agree
[-1] Disagree
[0] Neutral

Thanks.



--

-- Appy

Re: [VOTE] Merge branch HBASE-19397 back to master and branch-2.

Reply via email to