Hi, Colin, Boyang,

Colin, thanks for the clarification. Somehow, I thought that even if the
controller is ran independently, it
would still run the listeners of the broker and thus would be accessible by
redirecting on the loopback
interface. My mistake.

Boyang, I have few questions/comments regarding the updated KIP:

1. I think that it would be great if we could clarify how old admin clients
which are directly talking to the
controller will work with this KIP. I read between the lines that, as we
propose to provide a random
broker Id as the controller Id in the metadata response, they will use a
single node as a proxy. Is that
correct? This deserves to be called out more explicitly in the design
section instead of being hidden
in the protocol bump of the metadata RPC.

1.1 If I understand correctly, could we assume that old admin clients will
stick to the same "fake controller"
until they refresh their metadata? Refreshing the metadata usually happens
when NOT_CONTROLLER
is received but this won't happen anymore so they should change
infrequently.

2. For the new admin client, I suppose that we plan on using
LeastLoadedNodeProvider for the
requests that are using ControllerNodeProvider. We could perhaps mention it.

3. Pre KIP-500, will we have a way to distinguish if a request that is
received by the controller is
coming directly from a client or from a broker? You mention that the
listener can be used to do
this but as you pointed out, it is not mandatory. Do we have another
reliable method? I am asking
in the context of KIP-599 with the current controller, we may need to
throttle differently if the
request comes from a client or from a broker.

4. Could we add `InitialClientId` as well? This will be required for the
quota as we can apply them
by principal and/or clientId.

5. A small remark regarding the structure of the KIP. It is a bit weird
that requests that do not go
to the controller are mentioned in the Proposed Design section and the
requests that go to the
controller are mentioned in the Public Interfaces. When one read the
Proposed Design, it does not
get a full picture of the whole new routing proposal for old and new
clients. It would be great if we
could have a full overview in that section.

Overall the change makes sense to me. I will work on drafting an addendum
to KIP-599 to
alter the design to cope with these changes. At a first glance, that seems
doable if 1.1, 3
and 4 are OK.

Thanks,
David

On Wed, Jul 29, 2020 at 5:29 AM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> Thanks for the feedback Colin!
>
> On Tue, Jul 28, 2020 at 2:11 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> > Hi Boyang,
> >
> > Thanks for updating this.  A few comments below:
> >
> > In the "Routing Request Security" section, there is a reference to "older
> > requests that need redirection."  But after these new revisions, both new
> > and old requests need redirection.  So we should rephrase this.
> >
> > > In addition, to avoid exposing this forwarding power to the admin
> > clients,
> > > the routing request shall be forwarded towards the controller broker
> > internal
> > > endpoint which should be only visible to other brokers inside the
> > cluster
> > > in the KIP-500 controller. Any admin configuration request with broker
> > > principal should not be going through the public endpoint and will be
> > > rejected for security purpose.
> >
> > We should also describe how this will work in the pre-KIP-500 case.  In
> > that case, CLUSTER_ACTION gets the extra permissions described here only
> > when the message comes in on the inter-broker listener.  We should state
> > that here.
> >
> > (I can see that you have this information later on in the "Security
> Access
> > Changes" section, but it would be good to have it here as well, to avoid
> > confusion.)
> >
> > > To be more strict of protecting controller information, the
> > "ControllerId"
> > > field in new MetadataResponse shall be set to -1 when the original
> > request
> > > comes from a non-broker client and it is already on v10. We shall use
> the
> > > request listener name to distinguish whether a given request is
> > inter-broker,
> > > or from the client.
> >
> > I'm not sure why we would need to distinguish between broker clients and
> > non-broker clients.  Brokers don't generally send MetadataRequests to
> other
> > brokers, do they?  Brokers learn about metadata from
> UpdateMetadataRequest
> > and LeaderAndIsrRequest, not by sending MetadataRequests to other
> brokers.
> >
> > We do have one use case where the MetadataRequest gets sent between the
> brokers, which is the InterBrokerSendThread. Currently we don't rely on it
> to get the controller id, so I guess your suggestion should be good to
> enforce. We could use some meta comment on the NetworkClient that it should
> not be used to get the controller location.
>
> Probably what we want here is: v0-v9: return a random broker in the cluster
> > as the controller ID.  v10: no controllerID present in the
> > MetadataResponse.  We should also deprecate the adminClient method which
> > gets the controllerId.
> >
> > > BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the
> > > request during forwarding, this indicates an internal error on the
> broker
> > > cluster security setup.", BrokerAuthorizationFailureException::new);
> >
> > Grammar nitpick: It would be good to have a period between "forwarding"
> > and "this" to avoid a run-on sentence :)
> >
> > best,
> > Colin
> >
> >
> > On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote:
> > > Hey there,
> > >
> > > I'm re-opening this thread because after some initial implementations
> > > started, we spotted some gaps in the approved KIP as well as some
> > > inconsistencies with KIP-631 controller. There are a couple of
> addendums
> > to
> > > the existing KIP, specifically:
> > >
> > > 1. As the controller is foreseen to be only accessible to the brokers,
> > the
> > > new admin client would not have direct access to the controller. It is
> > > guaranteed on the MetadataResponse level which no longer provides
> > > `ControllerId` to client side requests.
> > >
> > > 2. The broker would forward any direct ZK path mutation requests,
> > including
> > > topic creation/deletion, reassignment, etc since we deprecate the
> direct
> > > controller access on the client side. No more protocol version bump is
> > > necessary for the configuration requests.
> > >
> > > 3. To make sure forwarding requests pass the authorization, broker
> > > principal CLUSTER_ACTION would be allowed to be used as an alternative
> > > authentication method for a variety of principal operations, including
> > > ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding request
> > > needs to use the proxy broker's own principal, which is currently not
> > > supported to be used for many configuration change authentication
> listed
> > > above. The full list could be found in the KIP.
> > >
> > > 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate any
> > > internal security configuration failure, when the forwarded request
> > failed
> > > authentication on the controller side.
> > >
> > > Let me know what you think. With such a major refinement of the KIP,
> I'm
> > > open for re-vote after discussions converge.
> > >
> > > Boyang
> > >
> > > On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <reluctanthero...@gmail.com
> >
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > > I have also synced on the KIP-578 which was doing the partition
> limit,
> > to
> > > > make sure the partition limit error code would be properly propagated
> > once
> > > > it is done on top of KIP-590. Let me know if you have further
> > questions or
> > > > concerns.
> > > >
> > > > Boyang
> > > >
> > > > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <
> > reluctanthero...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks for the clarification, Colin and Ismael. Personally I also
> feel
> > > >> Option A is better to prioritize fixing the gap. Just to be clear,
> the
> > > >> proposed solution would be:
> > > >>
> > > >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In the
> > > >> application level, we should swap the error message with the actual
> > failure
> > > >> reason such as "violation of topic creation policy when attempting
> to
> > auto
> > > >> create internal topic through MetadataRequest."
> > > >>
> > > >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail fast.
> > > >>
> > > >> Will address our other discussed points as well in the KIP, let me
> > know
> > > >> if you have further questions.
> > > >>
> > > >> Thanks,
> > > >> Boyang
> > > >>
> > > >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > >>
> > > >>> Option A is basically what I was thinking. But with a slight
> > adjustment:
> > > >>>
> > > >>> New versions of MetadataResponse return POLICY_VIOLATION, old
> > versions
> > > >>> return AUTHORIZATION_FAILED. The latter works correctly with old
> Java
> > > >>> clients (i.e. the client fails fast and propagates the error), I've
> > > >>> tested
> > > >>> it. Adjust new clients to treat POLICY_VIOLATION like
> > > >>> AUTHORIZATION_FAILED,
> > > >>> but propagate the custom error message.
> > > >>>
> > > >>> Ismael
> > > >>>
> > > >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <cmcc...@apache.org>
> > > >>> wrote:
> > > >>>
> > > >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma <
> > ism...@juma.me.uk>
> > > >>> > wrote:
> > > >>> > > > >
> > > >>> > > > > > Hi Colin,
> > > >>> > > > > >
> > > >>> > > > > > The KIP states in the Compatibility section (not Future
> > work):
> > > >>> > > > > >
> > > >>> > > > > > "To support the proxy of requests, we need to build a
> > channel
> > > >>> for
> > > >>> > > > > > brokers to talk directly to the controller. This part of
> > the
> > > >>> design
> > > >>> > > > > > is internal change only and won’t block the KIP
> progress."
> > > >>> > > > > >
> > > >>> > > > > > I am clarifying that this is not internal only due to the
> > > >>> config.
> > > >>> > If we
> > > >>> > > > > > say that this KIP depends on another KIP before we can
> > merge
> > > >>> > > > > > it, that's fine although it feels a bit unnecessary.
> > > >>> > > > > >
> > > >>> >
> > > >>> > Hi Ismael,
> > > >>> >
> > > >>> > I didn't realize there was still a reference to the separate
> > controller
> > > >>> > channel in the "Compatibility, Deprecation, and Migration Plan"
> > > >>> section.  I
> > > >>> > agree that it doesn't really belong there.  Given that this is
> > creating
> > > >>> > confusion, I would suggest that we just drop this from the KIP
> > > >>> entirely.
> > > >>> > It really is orthogonal to what this KIP is about-- we don't
> need a
> > > >>> > separate channel to implement redirection.
> > > >>> >
> > > >>> > Boyang wrote:
> > > >>> >
> > > >>> > >
> > > >>> > > We are only opening the doors for specific internal topics
> > (offsets,
> > > >>> txn
> > > >>> > > log), which I assume the client should have no possibility to
> > mutate
> > > >>> the
> > > >>> > > topic policy?
> > > >>> > >
> > > >>> >
> > > >>> > Hi Boyang,
> > > >>> >
> > > >>> > I think you and Ismael are talking about different scenarios.
> You
> > are
> > > >>> > describing the scenario where the broker is auto-creating the
> > > >>> transaction
> > > >>> > log topic or consumer offset topic.  This scenario indeed should
> > not
> > > >>> happen
> > > >>> > in a properly-configured cluster.  However, Ismael is describing
> a
> > > >>> scenario
> > > >>> > where the client is auto-creating some arbitrary non-internal
> topic
> > > >>> just by
> > > >>> > sending a metadata request.
> > > >>> >
> > > >>> > As far as I can see, there are two solutions here:
> > > >>> >
> > > >>> > A. Close the hole in CreateTopicsPolicy immediately.  In new
> > versions,
> > > >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we tried
> > to
> > > >>> > auto-create a topic and failed.  Find some other error code to
> > return
> > > >>> for
> > > >>> > existing versions.
> > > >>> >
> > > >>> > B. Keep the hole in CreateTopicsPolicy and add some configuration
> > to
> > > >>> allow
> > > >>> > admins to gradually migrate to closing it.  In practice, this
> > probably
> > > >>> > means a configuration toggle that enables direct ZK access, that
> > > >>> starts off
> > > >>> > as enabled.  Then we can eventually default it to false and then
> > > >>> remove it
> > > >>> > entirely over time.
> > > >>> >
> > > >>> > best,
> > > >>> > Colin
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Reply via email to