I've updated the KIP with the feedback from this discussion
https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller.
I'll send out the vote thread shortly.

Thanks again,
David

On Tue, May 5, 2020 at 10:34 AM Tom Bentley <tbent...@redhat.com> wrote:

> Hi Colin,
>
> Yeah, that makes sense, thanks. I was thinking, longer term, that there are
> other benefits to having the log dir information available to the
> controller. For example it would allow the possibility for CREATE_TOPIC
> requests to include the intended log dir for each replica. But that's
> obviously completely out of scope for this KIP.
>
> Kind regards,
>
> Tom
>
> On Mon, May 4, 2020 at 10:11 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> > Hi Tom,
> >
> > As you said, the controller doesn't know about log directories, although
> > individual brokers do.  So the brokers do currently have to enumerate all
> > the partitions that need to be removed to the controllers explicitly.  So
> > this KIP isn't changing anything in that regard.
> >
> > The current flow is:
> > 1. ping ZK back-channel
> > 2. controller sends a full LeaderAndIsrRequest to the broker
> > 3. the broker sends a full response containing error codes for all
> > partitions.  Partitions on the failed storage have a nonzero error code;
> > the others have 0.
> >
> > The new flow is:
> > 1. the broker sends an RPC with all the failed partitions
> >
> > So the new flow actually substantially reduces the amount of network
> > traffic, since previously we sent a full LeaderAndIsrRequest containing
> all
> > of the partitions.  Now we just send all the partitions in the failed
> > storage directory.  That could still be a lot, but certainly only be a
> > fraction of what a full LeaderAndIsrRequest would have.
> >
> > Sorry if I'm repeating stuff you already figured out, but I just wanted
> to
> > be more clear about this (I found it confusing too until David explained
> it
> > to me originally...)
> >
> > best,
> > Colin
> >
> >
> > On Sat, May 2, 2020, at 10:30, Tom Bentley wrote:
> > > Hi David,
> > >
> > > > In the rejecting the alternative of having an RPC for log dir
> failures
> > > > you say
> > > >
> > > > I guess what I really mean here is that I wanted to avoid exposing
> the
> > > > notion of a log dir to the controller. I can update the description
> to
> > > > reflect this.
> > > >
> > >
> > > Ah, I think I see now. While each broker knows about its log dirs this
> > > isn't something that's stored in zookeeper or known to the controller.
> > >
> > >
> > > > > It's also not completely clear that the cost of having to enumerate
> > all
> > > > the partitions on a log dir was weighed against the perceived benefit
> > of a
> > > > more flexible RPC.
> > > >
> > > > The enumeration isn't strictly required. In the "RPC semantics"
> > section, I
> > > > mention that if no topics are present in the RPC request, then all
> > topics
> > > > on the broker are implied. And if a topic is given with no
> partitions,
> > all
> > > > partitions for that topic (on the broker) are implied. Does this make
> > > > sense?
> > > >
> > >
> > > So the no-topics-present optimisation wouldn't be available to a broker
> > > with >1 log dirs where only some of the log dirs failed. I don't
> suppose
> > > that's a problem though.
> > >
> > > Thanks again,
> > >
> > > Tom
> > >
> > >
> > > On Fri, May 1, 2020 at 5:48 PM David Arthur <mum...@gmail.com> wrote:
> > >
> > > > Jose/Colin/Tom, thanks for the feedback!
> > > >
> > > > > Partition level errors
> > > >
> > > > This was an oversight on my part, I meant to include these in the
> > response
> > > > RPC. I'll update that.
> > > >
> > > > > INVALID_REQUEST
> > > >
> > > > I'll update this text description, that was a copy/paste left over
> > > >
> > > > > I think we should mention that the controller will keep it's
> current
> > > > implementation of marking the replicas as offline because of failure
> > in the
> > > > LeaderAndIsr response.
> > > >
> > > > Good suggestions, I'll add that.
> > > >
> > > > > Does EventType need to be an Int32?
> > > >
> > > > No, it doesn't. I'll update to Int8. Do we have an example of the
> enum
> > > > paradigm in our RPC today? I'm curious if we actually map it to a
> real
> > Java
> > > > enum in the AbstractRequest/Response classes.
> > > >
> > > > > AlterReplicaStates
> > > >
> > > > Sounds good to me.
> > > >
> > > > > In the rejecting the alternative of having an RPC for log dir
> > failures
> > > > you say
> > > >
> > > > I guess what I really mean here is that I wanted to avoid exposing
> the
> > > > notion of a log dir to the controller. I can update the description
> to
> > > > reflect this.
> > > >
> > > > > It's also not completely clear that the cost of having to enumerate
> > all
> > > > the partitions on a log dir was weighed against the perceived benefit
> > of a
> > > > more flexible RPC.
> > > >
> > > > The enumeration isn't strictly required. In the "RPC semantics"
> > section, I
> > > > mention that if no topics are present in the RPC request, then all
> > topics
> > > > on the broker are implied. And if a topic is given with no
> partitions,
> > all
> > > > partitions for that topic (on the broker) are implied. Does this make
> > > > sense?
> > > >
> > > > Thanks again! I'll update the KIP and leave a message here once it's
> > > > revised.
> > > >
> > > > David
> > > >
> > > > On Wed, Apr 29, 2020 at 11:20 AM Tom Bentley <tbent...@redhat.com>
> > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > In the rejecting the alternative of having an RPC for log dir
> > failures
> > > > you
> > > > > say:
> > > > >
> > > > > It was also rejected to prevent "leaking" the notion of a log dir
> to
> > the
> > > > > > public API.
> > > > > >
> > > > >
> > > > > I'm not quite sure I follow that argument, since we already have
> > RPCs for
> > > > > changing replica log dirs. So in a general sense log dirs already
> > exist
> > > > in
> > > > > the API. I suspect you were using public API to mean something more
> > > > > specific; could you elaborate?
> > > > >
> > > > > It's also not completely clear that the cost of having to enumerate
> > all
> > > > the
> > > > > partitions on a log dir was weighed against the perceived benefit
> of
> > a
> > > > more
> > > > > flexible RPC. (I'm sure it was, but it would be good to say so).
> > > > >
> > > > > Many thanks,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Wed, Apr 29, 2020 at 12:04 AM Colin McCabe <cmcc...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > Thanks for the KIP!
> > > > > >
> > > > > > I think the ReplicaStateEventResponse should have a separate
> error
> > code
> > > > > > for each partition.
> > > > > >  Currently it just has one error code for the whole
> > request/response,
> > > > if
> > > > > > I'm reading this right.  I think Jose made a similar point as
> > well.  We
> > > > > > should plan for scenarios where some replica states can be
> changed
> > and
> > > > > some
> > > > > > can't.
> > > > > >
> > > > > > Does EventType need to be an Int32?  For enums, we usually use
> the
> > > > > > smallest reasonable type, which would be Int8 here.  We can
> always
> > > > change
> > > > > > the schema later if needed.  UNKNOWN_REPLICA_EVENT_TYPE seems
> > > > unnecessary
> > > > > > since INVALID_REQUEST covers this case.
> > > > > >
> > > > > > I'd also suggest "AlterReplicaStates[Request,Response]" as a
> > slightly
> > > > > > better name for this RPC.
> > > > > >
> > > > > > cheers,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 7, 2020, at 12:43, David Arthur wrote:
> > > > > > > Hey everyone,
> > > > > > >
> > > > > > > I'd like to start the discussion for KIP-589, part of the
> KIP-500
> > > > > effort
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > >
> > > > > > > This is a proposal to use a new RPC instead of ZooKeeper for
> > > > notifying
> > > > > > the
> > > > > > > controller of an offline replica. Please give a read and let me
> > know
> > > > > your
> > > > > > > thoughts.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > David
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > David Arthur
> > > >
> > >
> >
> >
>


-- 
David Arthur

Reply via email to