[
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167800#comment-15167800
]
Parth Brahmbhatt edited comment on KAFKA-1696 at 2/25/16 8:23 PM:
------------------------------------------------------------------
So here is how that request path would work in my mind:
* Client sends request for token acquisition to any broker.
* Broker forwards the request to the controller.
* Controller generates the token and pushes the tokens to all brokers. (Will
need a new API)
* Controller responds back to original broker with the token.
* Broker responds back to client with the token.
Renewal is pretty much the same.
The race condition you are describing can still happen in the above case during
renewal because controller may have pushed the renewal information to a subset
of broker and die. The clients depending on which broker it connects to may get
an exception or success. I do agree though that given controller would not have
responded back with success the original renew request should be retried and
most likely the scenario can be avoided.
If the above steps seems right , here are the advantages of this approach:
Advantage:
* Token generation/renewal will not involve zookeeper. I am not too worried
about the load on zookeeper added due to this but it definitely seems more
secure and follows the Hadoop model more closely. However zookeeper needs to be
secure for lot of other things in kafka so not sure if this should really be a
concern.
* Clients will get better consistency.
Disadvantage:
* We will have to add new APIs to support controller pushing tokens to brokers
on top of the minimal APIs that are currently proposed. I like the publicly
available APIs to be minimal and I like them to be something that we expect
clients to use + this adds more development complexity. Overall this seems like
a more philosophical thing so depending on who you ask they may see this as
disadvantage or not.
* We will also have to add APIs to support the bootstrapping case. What I mean
is , when a new broker comes up it will have to get all delegation tokens from
the controller so we will again need to add new APIs like getAllTokens. Again
some of us may see that as disadvantage and some may not.
* In catastrophic failures where all brokers go down, the tokens will be lost
even if servers are restarted as tokens are not persisted anywhere. Granted if
something like this happens customer has bigger things to worry about but if
they don't have to regenerate/redistribute tokens that is one less thing.
I don't see strong reasons to go one way or another so I would still like to go
with zookeeper but don't really feel strongly about it. If you think I have
mischaracterized what you were proposing feel free to add more details or list
and other advantages/disadvantages.
was (Author: parth.brahmbhatt):
So here is how that request path would work in my mind:
* Client sends request for token acquisition to any broker.
* Broker forwards the request to the controller.
* Controller generates the token and pushes the tokens to all brokers. (Will
need a new API)
* Controller responds back to original broker with the token.
* Broker responds back to client with the token.
Renewal is pretty much the same.
The race condition you are describing can still happen in the above case during
renewal because controller may have pushed the renewal information to a subset
of broker and die. The clients depending on which broker it connects to may get
an exception or success. I do agree though that given controller would not have
responded back with success the original renew request should be retried and
most likely the scenario can be avoided.
If the above steps seems right , here are the advantages of this approach:
Advantage:
* Token generation/renewal will not involve zookeeper. I am not too worried
about the load on zookeeper added due to this but it definitely seems more
secure and follows the Hadoop model more closely. However zookeeper needs to be
secure for lot of other things in kafka so not sure if this should really be a
concern.
Disadvantage:
* We will have to add new APIs to support controller pushing tokens to brokers
on top of the minimal APIs that are currently proposed. I like the publicly
available APIs to be minimal and I like them to be something that we expect
clients to use + this adds more development complexity. Overall this seems like
a more philosophical thing so depending on who you ask they may see this as
disadvantage or not.
* We will also have to add APIs to support the bootstrapping case. What I mean
is , when a new broker comes up it will have to get all delegation tokens from
the controller so we will again need to add new APIs like getAllTokens. Again
some of us may see that as disadvantage and some may not.
* In catastrophic failures where all brokers go down, the tokens will be lost
even if servers are restarted as tokens are not persisted anywhere. Granted if
something like this happens customer has bigger things to worry about but if
they don't have to regenerate/redistribute tokens that is one less thing.
I don't see strong reasons to go one way or another so I would still like to go
with zookeeper but don't really feel strongly about it. If you think I have
mischaracterized what you were proposing feel free to add more details or list
and other advantages/disadvantages.
> Kafka should be able to generate Hadoop delegation tokens
> ---------------------------------------------------------
>
> Key: KAFKA-1696
> URL: https://issues.apache.org/jira/browse/KAFKA-1696
> Project: Kafka
> Issue Type: Sub-task
> Components: security
> Reporter: Jay Kreps
> Assignee: Parth Brahmbhatt
>
> For access from MapReduce/etc jobs run on behalf of a user.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)