+1 (non-binding). Thanks for the KIP!

On Thu, Jul 18, 2019 at 5:59 PM George Li <sql_consult...@yahoo.com.invalid>
wrote:

>  +1 (non-binding)
>
>
>
> Thanks for addressing the comments.
> George
>
>     On Thursday, July 18, 2019, 05:03:58 PM PDT, Gwen Shapira <
> g...@confluent.io> wrote:
>
>  Renewing my +1, thank you Colin and Stan for working through all the
> questions, edge cases, requests and alternatives. We ended up with a
> great protocol.
>
> On Thu, Jul 18, 2019 at 4:54 PM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> > +1 Thanks for the KIP. Really looking forward to this!
> >
> > -Jason
> >
> > On Wed, Jul 17, 2019 at 1:41 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > Thanks, Stanislav.  Let's restart the vote to reflect the fact that
> we've
> > > made significant changes.  The new vote will go for 3 days as usual.
> > >
> > > I'll start with my +1 (binding).
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Jul 17, 2019, at 08:56, Stanislav Kozlovski wrote:
> > > > Hey everybody,
> > > >
> > > > We have further iterated on the KIP in the accompanying discussion
> thread
> > > > and I'd like to propose we resume the vote.
> > > >
> > > > Some notable changes:
> > > > - we will store reassignment information in the
> `/brokers/topics/[topic]`
> > > > - we will internally use two collections to represent a reassignment
> -
> > > > "addingReplicas" and "removingReplicas". LeaderAndIsr has been
> updated
> > > > accordingly
> > > > - the Alter API will still use the "targetReplicas" collection, but
> the
> > > > List API will now return three separate collections - the full
> replica
> > > set,
> > > > the replicas we are adding as part of this reassignment
> > > ("addingReplicas")
> > > > and the replicas we are removing ("removingReplicas")
> > > > - cancellation of a reassignment now means a proper rollback of the
> > > > assignment to its original state prior to the API call
> > > >
> > > > As always, you can re-read the KIP here
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Wed, May 22, 2019 at 6:12 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > > >
> > > > > Hi George,
> > > > >
> > > > > Thanks for taking a look.  I am working on getting a PR done as a
> > > > > proof-of-concept.  I'll post it soon.  Then we'll finish up the
> vote.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > > On Tue, May 21, 2019, at 17:33, George Li wrote:
> > > > > >  Hi Colin,
> > > > > >
> > > > > >  Great! Looking forward to these features.    +1 (non-binding)
> > > > > >
> > > > > > What is the estimated timeline to have this implemented?  If any
> help
> > > > > > is needed in the implementation of cancelling reassignments,  I
> can
> > > > > > help if there is spare cycle.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > George
> > > > > >
> > > > > >
> > > > > >
> > > > > >    On Thursday, May 16, 2019, 9:48:56 AM PDT, Colin McCabe
> > > > > > <cmcc...@apache.org> wrote:
> > > > > >
> > > > > >  Hi George,
> > > > > >
> > > > > > Yes, KIP-455 allows the reassignment of individual partitions to
> be
> > > > > > cancelled.  I think it's very important for these operations to
> be at
> > > > > > the partition level.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > On Tue, May 14, 2019, at 16:34, George Li wrote:
> > > > > > >  Hi Colin,
> > > > > > >
> > > > > > > Thanks for the updated KIP.  It has very good improvements of
> Kafka
> > > > > > > reassignment operations.
> > > > > > >
> > > > > > > One question, looks like the KIP includes the Cancellation of
> > > > > > > individual pending reassignments as well when the
> > > > > > > AlterPartitionReasisgnmentRequest has empty replicas for the
> > > > > > > topic/partition. Will you also be implementing the the
> partition
> > > > > > > cancellation/rollback in the PR ?    If yes,  it will make
> KIP-236
> > > (it
> > > > > > > has PR already) trivial, since the cancel all pending
> > > reassignments,
> > > > > > > one just needs to do a ListPartitionRessignmentRequest, then
> submit
> > > > > > > empty replicas for all those topic/partitions in
> > > > > > > one AlterPartitionReasisgnmentRequest.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > George
> > > > > > >
> > > > > > >    On Friday, May 10, 2019, 8:44:31 PM PDT, Colin McCabe
> > > > > > > <cmcc...@apache.org> wrote:
> > > > > > >
> > > > > > >  On Fri, May 10, 2019, at 17:34, Colin McCabe wrote:
> > > > > > > > On Fri, May 10, 2019, at 16:43, Jason Gustafson wrote:
> > > > > > > > > Hi Colin,
> > > > > > > > >
> > > > > > > > > I think storing reassignment state at the partition level
> is
> > > the
> > > > > right move
> > > > > > > > > and I also agree that replicas should understand that
> there is
> > > a
> > > > > > > > > reassignment in progress. This makes KIP-352 a trivial
> > > follow-up
> > > > > for
> > > > > > > > > example. The only doubt I have is whether the leader and
> isr
> > > znode
> > > > > is the
> > > > > > > > > right place to store the target reassignment. It is a bit
> odd
> > > to
> > > > > keep the
> > > > > > > > > target assignment in a separate place from the current
> > > assignment,
> > > > > right? I
> > > > > > > > > assume the thinking is probably that although the current
> > > > > assignment should
> > > > > > > > > probably be in the leader and isr znode as well, it is
> hard to
> > > > > move the
> > > > > > > > > state in a compatible way. Is that right? But if we have no
> > > plan
> > > > > to remove
> > > > > > > > > the assignment znode, do you see a downside to storing the
> > > target
> > > > > > > > > assignment there as well?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Hi Jason,
> > > > > > > >
> > > > > > > > That's a good point -- it's probably better to keep the
> target
> > > > > > > > assignment in the same znode as the current assignment, for
> > > > > > > > consistency.  I'll change the KIP.
> > > > > > >
> > > > > > > Hi Jason,
> > > > > > >
> > > > > > > Thanks again for the review.
> > > > > > >
> > > > > > > I took another look at this, and I think we should stick with
> the
> > > > > > > initial proposal of putting the reassignment state into
> > > > > > > /brokers/topics/[topic]/partitions/[partitionId]/state.  The
> > > reason is
> > > > > > > because we'll want to bump the leader epoch for the partition
> when
> > > > > > > changing the reassignment state, and the leader epoch resides
> in
> > > that
> > > > > > > znode anyway.  I agree there is some inconsistency here, but
> so be
> > > it:
> > > > > > > if we were to greenfield these zookeeper data structures, we
> might
> > > do
> > > > > > > it differently, but the proposed scheme will work fine and be
> > > > > > > extensible for the future.
> > > > > > >
> > > > > > > >
> > > > > > > > > A few additional questions:
> > > > > > > > >
> > > > > > > > > 1. Should `alterPartitionReassignments` be
> > > > > `alterPartitionAssignments`?
> > > > > > > > > It's the current assignment we're altering, right?
> > > > > > > >
> > > > > > > > That's fair.  AlterPartitionAssigments reads a little
> better, and
> > > > > I'll
> > > > > > > > change it to that.
> > > > > > >
> > > > > > > +1.  I've changed the RPC and API name in the wiki.
> > > > > > >
> > > > > > > >
> > > > > > > > > 2. Does this change affect the Metadata API? In other
> words,
> > > are
> > > > > clients
> > > > > > > > > aware of reassignments? If so, then we probably need a
> change
> > > to
> > > > > > > > > UpdateMetadata as well. The only alternative I can think of
> > > would
> > > > > be to
> > > > > > > > > represent the replica set in the Metadata request as the
> union
> > > of
> > > > > the
> > > > > > > > > current and target replicas, but I can't think of any
> benefit
> > > to
> > > > > hiding
> > > > > > > > > reassignments. Note that if we did this, we probably
> wouldn't
> > > need
> > > > > a
> > > > > > > > > separate API to list reassignments.
> > > > > > > >
> > > > > > > > I thought about this a bit... and I think on balance, you're
> > > right.
> > > > > We
> > > > > > > > should keep this information together with the replica
> nodes, isr
> > > > > > > > nodes, and offline replicas, and that information is
> available in
> > > > > the
> > > > > > > > MetadataResponse.
> > > > > > > >  However, I do think in order to do this, we'll need a flag
> in
> > > the
> > > > > > > > MetadataRequest that specifiies "only show me reassigning
> > > > > partitions".
> > > > > > > > I'll add this.
> > > > > > >
> > > > > > > I revisited this, and I think we should stick with the original
> > > > > > > proposal of having a separate ListPartitionReassignments API.
> > > There
> > > > > > > really is no use case where the Producer or Consumer needs to
> know
> > > > > > > about a reassignment.  They should just be notified when the
> set of
> > > > > > > partitions changes, which doesn't require changes to
> > > > > > > MetadataRequest/Response.  The Admin client only cares if
> someone
> > > is
> > > > > > > managing the reassignment.  So adding this state to the
> > > > > > > MetadataResponse adds overhead for no real benefit.  In the
> common
> > > > > case
> > > > > > > where there is no ongoing reassignment, it would be 4 bytes per
> > > > > > > partition of extra overhead in the MetadataResponse.
> > > > > > >
> > > > > > > In general, I think we have a problem of oversharing in the
> > > > > > > MetadataRequest/Response.  As we 10x or 100x the number of
> > > partitions
> > > > > > > we support, we'll need to get stricter about giving clients
> only
> > > the
> > > > > > > information they actually need, about the partitions they
> actually
> > > > > care
> > > > > > > about.  Reassignment state clearly falls in the category of
> state
> > > that
> > > > > > > isn't needed by clients (except very specialized rebalancing
> > > programs).
> > > > > > >
> > > > > > > Another important consideration here is that someone managing
> an
> > > > > > > ongoing reassignment wants the most up-to-date information,
> which
> > > is
> > > > > to
> > > > > > > be found on the controller.  Therefore adding this state to
> > > listTopics
> > > > > > > or describeTopics, which could contact any node in the
> cluster, is
> > > > > > > sub-optimal.
> > > > > > >
> > > > > > > Finally, adding this to listTopics or describeTopics feels
> like a
> > > > > warty
> > > > > > > API.  It's an extra boolean which interacts with other extra
> > > booleans
> > > > > > > like "show internal", etc. in weird ways.  I think a separate
> API
> > > is
> > > > > > > cleaner.
> > > > > > >
> > > > > > > >
> > > > > > > > > 3. As replicas come into sync, they will join the ISR.
> Will we
> > > > > await all
> > > > > > > > > target replicas joining the ISR before taking the replica
> out
> > > of
> > > > > the target
> > > > > > > > > replicas set? Also, I assume that target replicas can
> still be
> > > > > elected as
> > > > > > > > > leader?
> > > > > > > >
> > > > > > > > We'll take a replica out of the target replicas set as soon
> as
> > > that
> > > > > > > > replica is in the ISR.  Let me clarify this in the KIP.
> > > > > > > >
> > > > > > > > > 4. Probably useful to mention permissions for the new APIs.
> > > > > > > >
> > > > > > > > Good point.  I think alterPartitionAssignments should require
> > > ALTER
> > > > > on
> > > > > > > > CLUSTER.  MetadataRequest permissions will be unchanged.
> > > > > > >
> > > > > > > I added permission information.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > > >
> > > > > > > > best,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jason
> > > > > > > > >
> > > > > > > > > On Fri, May 10, 2019 at 9:30 AM Gwen Shapira <
> > > g...@confluent.io>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (binding)
> > > > > > > > > > Looks great, and will be awesome to have this new
> capability.
> > > > > > > > > >
> > > > > > > > > > On Wed, May 8, 2019 at 10:23 PM Colin McCabe <
> > > cmcc...@apache.org>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I'd like to start the vote for KIP-455: Create an
> > > > > Administrative API for
> > > > > > > > > > > Replica Reassignment.  I think this KIP is important
> since
> > > it
> > > > > will unlock
> > > > > > > > > > > many follow-on improvements to Kafka reassignment (see
> the
> > > > > "Future work"
> > > > > > > > > > > section, plus a lot of the other discussions we've had
> > > > > recently about
> > > > > > > > > > > reassignment).  It also furthers the important KIP-4
> goal
> > > of
> > > > > removing
> > > > > > > > > > > direct access to ZK.
> > > > > > > > > > >
> > > > > > > > > > > I made a few changes based on the discussion in the
> > > [DISCUSS]
> > > > > thread.  As
> > > > > > > > > > > Robert suggested, I removed the need to explicitly
> cancel a
> > > > > reassignment
> > > > > > > > > > > for a partition before setting up a different
> reassignment
> > > for
> > > > > that
> > > > > > > > > > > specific partition.  I also simplified the API a bit by
> > > adding
> > > > > a
> > > > > > > > > > > PartitionReassignment class which is used by both the
> alter
> > > > > and list
> > > > > > > > > > APIs.
> > > > > > > > > > >
> > > > > > > > > > > I modified the proposal so that we now deprecate the
> old
> > > > > znode-based API
> > > > > > > > > > > rather than removing it completely.  That should give
> > > external
> > > > > > > > > > rebalancing
> > > > > > > > > > > tools some time to transition to the new API.
> > > > > > > > > > >
> > > > > > > > > > > To clarify a question Viktor asked, I added a note
> that the
> > > > > > > > > > > kafka-reassign-partitions.sh will now use a
> > > --bootstrap-server
> > > > > argument
> > > > > > > > > > to
> > > > > > > > > > > contact the admin APIs.
> > > > > > > > > > >
> > > > > > > > > > > thanks,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > *Gwen Shapira*
> > > > > > > > > > Product Manager | Confluent
> > > > > > > > > > 650.450.2760 | @gwenshap
> > > > > > > > > > Follow us: Twitter <https://twitter.com/ConfluentInc> |
> blog
> > > > > > > > > > <http://www.confluent.io/blog>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

Reply via email to