Hi Boyang,

Thanks for updating this.  A few comments below:

In the "Routing Request Security" section, there is a reference to "older 
requests that need redirection."  But after these new revisions, both new and 
old requests need redirection.  So we should rephrase this.

> In addition, to avoid exposing this forwarding power to the admin clients,
> the routing request shall be forwarded towards the controller broker internal
> endpoint which should be only visible to other brokers inside the cluster 
> in the KIP-500 controller. Any admin configuration request with broker 
> principal should not be going through the public endpoint and will be 
> rejected for security purpose.

We should also describe how this will work in the pre-KIP-500 case.  In that 
case, CLUSTER_ACTION gets the extra permissions described here only when the 
message comes in on the inter-broker listener.  We should state that here.

(I can see that you have this information later on in the "Security Access 
Changes" section, but it would be good to have it here as well, to avoid 
confusion.)

> To be more strict of protecting controller information, the "ControllerId"
> field in new MetadataResponse shall be set to -1 when the original request
> comes from a non-broker client and it is already on v10. We shall use the
> request listener name to distinguish whether a given request is inter-broker,
> or from the client.

I'm not sure why we would need to distinguish between broker clients and 
non-broker clients.  Brokers don't generally send MetadataRequests to other 
brokers, do they?  Brokers learn about metadata from UpdateMetadataRequest and 
LeaderAndIsrRequest, not by sending MetadataRequests to other brokers.

Probably what we want here is: v0-v9: return a random broker in the cluster as 
the controller ID.  v10: no controllerID present in the MetadataResponse.  We 
should also deprecate the adminClient method which gets the controllerId.

> BROKER_AUTHORIZATION_FAILURE(92, "Authorization failed for the
> request during forwarding, this indicates an internal error on the broker
> cluster security setup.", BrokerAuthorizationFailureException::new);

Grammar nitpick: It would be good to have a period between "forwarding" and 
"this" to avoid a run-on sentence :)

best,
Colin


On Mon, Jul 27, 2020, at 21:47, Boyang Chen wrote:
> Hey there,
> 
> I'm re-opening this thread because after some initial implementations
> started, we spotted some gaps in the approved KIP as well as some
> inconsistencies with KIP-631 controller. There are a couple of addendums to
> the existing KIP, specifically:
> 
> 1. As the controller is foreseen to be only accessible to the brokers, the
> new admin client would not have direct access to the controller. It is
> guaranteed on the MetadataResponse level which no longer provides
> `ControllerId` to client side requests.
> 
> 2. The broker would forward any direct ZK path mutation requests, including
> topic creation/deletion, reassignment, etc since we deprecate the direct
> controller access on the client side. No more protocol version bump is
> necessary for the configuration requests.
> 
> 3. To make sure forwarding requests pass the authorization, broker
> principal CLUSTER_ACTION would be allowed to be used as an alternative
> authentication method for a variety of principal operations, including
> ALTER, ALTER_CONFIG, DELETE, etc. It is because the forwarding request
> needs to use the proxy broker's own principal, which is currently not
> supported to be used for many configuration change authentication listed
> above. The full list could be found in the KIP.
> 
> 4. Add a new BROKER_AUTHORIZATION_FAILURE error code to indicate any
> internal security configuration failure, when the forwarded request failed
> authentication on the controller side.
> 
> Let me know what you think. With such a major refinement of the KIP, I'm
> open for re-vote after discussions converge.
> 
> Boyang
> 
> On Wed, Jul 1, 2020 at 2:17 PM Boyang Chen <reluctanthero...@gmail.com>
> wrote:
> 
> > Hey folks,
> >
> > I have also synced on the KIP-578 which was doing the partition limit, to
> > make sure the partition limit error code would be properly propagated once
> > it is done on top of KIP-590. Let me know if you have further questions or
> > concerns.
> >
> > Boyang
> >
> > On Tue, Jun 23, 2020 at 5:08 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> >> Thanks for the clarification, Colin and Ismael. Personally I also feel
> >> Option A is better to prioritize fixing the gap. Just to be clear, the
> >> proposed solution would be:
> >>
> >> 1. Bump the Metadata RPC version to return POLICY_VIOLATION. In the
> >> application level, we should swap the error message with the actual failure
> >> reason such as "violation of topic creation policy when attempting to auto
> >> create internal topic through MetadataRequest."
> >>
> >> 2. For older Metadata RPC, return AUTHORIZATION_FAILED to fail fast.
> >>
> >> Will address our other discussed points as well in the KIP, let me know
> >> if you have further questions.
> >>
> >> Thanks,
> >> Boyang
> >>
> >> On Tue, Jun 23, 2020 at 10:41 AM Ismael Juma <ism...@juma.me.uk> wrote:
> >>
> >>> Option A is basically what I was thinking. But with a slight adjustment:
> >>>
> >>> New versions of MetadataResponse return POLICY_VIOLATION, old versions
> >>> return AUTHORIZATION_FAILED. The latter works correctly with old Java
> >>> clients (i.e. the client fails fast and propagates the error), I've
> >>> tested
> >>> it. Adjust new clients to treat POLICY_VIOLATION like
> >>> AUTHORIZATION_FAILED,
> >>> but propagate the custom error message.
> >>>
> >>> Ismael
> >>>
> >>> On Mon, Jun 22, 2020 at 11:00 PM Colin McCabe <cmcc...@apache.org>
> >>> wrote:
> >>>
> >>> > > > > On Fri, Jun 19, 2020 at 3:18 PM Ismael Juma <ism...@juma.me.uk>
> >>> > wrote:
> >>> > > > >
> >>> > > > > > Hi Colin,
> >>> > > > > >
> >>> > > > > > The KIP states in the Compatibility section (not Future work):
> >>> > > > > >
> >>> > > > > > "To support the proxy of requests, we need to build a channel
> >>> for
> >>> > > > > > brokers to talk directly to the controller. This part of the
> >>> design
> >>> > > > > > is internal change only and won’t block the KIP progress."
> >>> > > > > >
> >>> > > > > > I am clarifying that this is not internal only due to the
> >>> config.
> >>> > If we
> >>> > > > > > say that this KIP depends on another KIP before we can merge
> >>> > > > > > it, that's fine although it feels a bit unnecessary.
> >>> > > > > >
> >>> >
> >>> > Hi Ismael,
> >>> >
> >>> > I didn't realize there was still a reference to the separate controller
> >>> > channel in the "Compatibility, Deprecation, and Migration Plan"
> >>> section.  I
> >>> > agree that it doesn't really belong there.  Given that this is creating
> >>> > confusion, I would suggest that we just drop this from the KIP
> >>> entirely.
> >>> > It really is orthogonal to what this KIP is about-- we don't need a
> >>> > separate channel to implement redirection.
> >>> >
> >>> > Boyang wrote:
> >>> >
> >>> > >
> >>> > > We are only opening the doors for specific internal topics (offsets,
> >>> txn
> >>> > > log), which I assume the client should have no possibility to mutate
> >>> the
> >>> > > topic policy?
> >>> > >
> >>> >
> >>> > Hi Boyang,
> >>> >
> >>> > I think you and Ismael are talking about different scenarios.  You are
> >>> > describing the scenario where the broker is auto-creating the
> >>> transaction
> >>> > log topic or consumer offset topic.  This scenario indeed should not
> >>> happen
> >>> > in a properly-configured cluster.  However, Ismael is describing a
> >>> scenario
> >>> > where the client is auto-creating some arbitrary non-internal topic
> >>> just by
> >>> > sending a metadata request.
> >>> >
> >>> > As far as I can see, there are two solutions here:
> >>> >
> >>> > A. Close the hole in CreateTopicsPolicy immediately.  In new versions,
> >>> > allow MetadataResponse to return AUTHORIZATION_FAILED if we tried to
> >>> > auto-create a topic and failed.  Find some other error code to return
> >>> for
> >>> > existing versions.
> >>> >
> >>> > B. Keep the hole in CreateTopicsPolicy and add some configuration to
> >>> allow
> >>> > admins to gradually migrate to closing it.  In practice, this probably
> >>> > means a configuration toggle that enables direct ZK access, that
> >>> starts off
> >>> > as enabled.  Then we can eventually default it to false and then
> >>> remove it
> >>> > entirely over time.
> >>> >
> >>> > best,
> >>> > Colin
> >>> >
> >>>
> >>
>

Reply via email to