On Mon, Jan 8, 2018 at 2:50 PM, 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> This 'new' feature only changes DDL part, not the core part of replication,
> i.e, how to read wal entries and how to replicate it to the remote cluster,
> etc. And also there is no pb message/storage layout change, you can think
> of this as a big refactoring. Theoretically we even do not need to add new
> UTs for this feature, i.e, no extra stability works. The only visible
> change to users is that it may require more time on modifying peers in
> shell. So in general I think it is less hurt to include it in the coming
> release?
>
> And why I think it SHOULD be included in our 2.0 release is that, the
> synchronous guarantee is really a good thing for our replication related
> UTs. The correctness of the current Test***Replication usually depends on a
> flakey condition - we will not do a log rolling between the modification on
> zk and the actual loading of the modification on RS. And we have also done
> a hard work to cleanup the lockings in ReplicationSourceManager and add a
> fat comment to say why it should be synchronized in this way. In general,
> the new code is much easier to read, test and debug, and also reduce the
> possibility of flakeyness, which could help us a lot when we start to
> stabilize our build.
>
> Thanks.
>
>
You see it as a big bug fix Duo?

I'm late to review. Will have a look after beta-1 goes out. This stuff
looks great from outside, especially distributed procedure framework which
we need all over the place.

In general I have no problem w/ this in master and an hbase 2.1 (2.1 could
be soon after 2.0). Its late for big stuff in 2.0.... but let me take a
looksee sir.

St.Ack





> 2018-01-09 4:53 GMT+08:00 Apekshit Sharma <a...@cloudera.com>:
>
> > Same questions as Josh's.
> > 1) We have RCs for beta1 now, which means only commits that can go in are
> > bug fixes only. This change - although important, needed from long time
> and
> > well done (testing, summary, etc) - seems rather very large to get into
> 2.0
> > now. Needs good justification why it has to be 2.1 instead of 2.0.
> >
> > -- Appy
> >
> >
> > On Mon, Jan 8, 2018 at 8:34 AM, Josh Elser <els...@apache.org> wrote:
> >
> > > -0 From a general project planning point-of-view (not based on the
> > > technical merit of the code) I am uncomfortable about pulling in a
> brand
> > > new feature after we've already made one beta RC.
> > >
> > > Duo -- can you expand on why this feature is so important that we
> should
> > > break our release plan? Are there problems that would make including
> this
> > > in a 2.1/3.0 unnecessarily difficult? Any kind of color you can provide
> > on
> > > "why does this need to go into 2.0?" would be helpful.
> > >
> > >
> > > On 1/6/18 1:54 AM, Duo Zhang wrote:
> > >
> > >> https://issues.apache.org/jira/browse/HBASE-19397
> > >>
> > >> We aim to move the peer modification framework from zk watcher to
> > >> procedure
> > >> v2 in this issue and the work is done now.
> > >>
> > >> Copy the release note here:
> > >>
> > >> Introduce 5 procedures to do peer modifications:
> > >>
> > >>> AddPeerProcedure
> > >>> RemovePeerProcedure
> > >>> UpdatePeerConfigProcedure
> > >>> EnablePeerProcedure
> > >>> DisablePeerProcedure
> > >>>
> > >>> The procedures are all executed with the following stage:
> > >>> 1. Call pre CP hook, if an exception is thrown then give up
> > >>> 2. Check whether the operation is valid, if not then give up
> > >>> 3. Update peer storage. Notice that if we have entered this stage,
> then
> > >>> we
> > >>> can not rollback any more.
> > >>> 4. Schedule sub procedures to refresh the peer config on every RS.
> > >>> 5. Do post cleanup if any.
> > >>> 6. Call post CP hook. The exception thrown will be ignored since we
> > have
> > >>> already done the work.
> > >>>
> > >>> The procedure will hold an exclusive lock on the peer id, so now
> there
> > is
> > >>> no concurrent modifications on a single peer.
> > >>>
> > >>> And now it is guaranteed that once the procedure is done, the peer
> > >>> modification has already taken effect on all RSes.
> > >>>
> > >>> Abstracte a storage layer for replication peer/queue manangement, and
> > >>> refactored the upper layer to remove zk related naming/code/comment.
> > >>>
> > >>> Add pre/postExecuteProcedures CP hooks to RegionServerObserver, and
> add
> > >>> permission check for executeProcedures method which requires the
> caller
> > >>> to
> > >>> be system user or super user.
> > >>>
> > >>> On rolling upgrade: just do not do any replication peer modifications
> > >>> during the rolling upgrading. There is no pb/layout changes on the
> > >>> peer/queue storage on zk.
> > >>>
> > >>>
> > >> And there are other benefits.
> > >> First, we have introduced a general procedure framework to send tasks
> to
> > >> RS
> > >> and report the report back to Master. It can be used to implement
> other
> > >> operations such as ACL change.
> > >> Second, zk is used as a external storage now since we do not depend on
> > zk
> > >> watcher any more, it will be much easier to implement a 'table based'
> > >> replication peer/queue storage.
> > >>
> > >> Please vote:
> > >> [+1] Agree
> > >> [-1] Disagree
> > >> [0] Neutral
> > >>
> > >> Thanks.
> > >>
> > >>
> >
> >
> > --
> >
> > -- Appy
> >
>

Reply via email to