[ 
https://issues.apache.org/jira/browse/KAFKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167800#comment-15167800
 ] 

Parth Brahmbhatt edited comment on KAFKA-1696 at 2/25/16 8:23 PM:
------------------------------------------------------------------

So here is how that request path would work in my mind:

* Client sends request for token acquisition to any broker.
* Broker forwards the request to the controller.
* Controller generates the token and pushes the tokens to all brokers. (Will 
need a new API)
* Controller responds back to original broker with the token.
* Broker responds back to client with the token.

Renewal is pretty much the same.

The race condition you are describing can still happen in the above case during 
renewal because controller may have pushed the renewal information to a subset 
of broker and die. The clients depending on which broker it connects to may get 
an exception or success. I do agree though that given controller would not have 
responded back with success the original renew request should be retried and 
most likely the scenario can be avoided.

If the above steps seems right , here are the advantages of this approach:

Advantage:
* Token generation/renewal will not involve zookeeper. I am not too worried 
about the load on zookeeper added due to this but it definitely seems more 
secure and follows the Hadoop model more closely. However zookeeper needs to be 
secure for lot of other things in kafka so not sure if this should really be a 
concern.
* Clients will get better consistency.

Disadvantage:
* We will have to add new APIs to support controller pushing tokens to brokers 
on top of the minimal APIs that are currently proposed. I like the publicly 
available APIs to be minimal and I like them to be something that we expect 
clients to use + this adds more development complexity. Overall this seems like 
a more philosophical thing so depending on who you ask they may see this as 
disadvantage or not. 
* We will also have to add APIs to support the bootstrapping case. What I mean 
is , when a new broker comes up it will have to get all delegation tokens from 
the controller so we will again need to add new APIs like getAllTokens. Again 
some of us may see that as disadvantage and some may not.
* In catastrophic failures where all brokers go down, the tokens will be lost 
even if servers are restarted as tokens are not persisted anywhere. Granted if 
something like this happens customer has bigger things to worry about but if 
they don't have to regenerate/redistribute tokens that is one less thing.

I don't see strong reasons to go one way or another so I would still like to go 
with zookeeper but don't really feel strongly about it. If you think I have 
mischaracterized what you were proposing feel free to add more details or list 
and other advantages/disadvantages.



was (Author: parth.brahmbhatt):
So here is how that request path would work in my mind:

* Client sends request for token acquisition to any broker.
* Broker forwards the request to the controller.
* Controller generates the token and pushes the tokens to all brokers. (Will 
need a new API)
* Controller responds back to original broker with the token.
* Broker responds back to client with the token.

Renewal is pretty much the same.

The race condition you are describing can still happen in the above case during 
renewal because controller may have pushed the renewal information to a subset 
of broker and die. The clients depending on which broker it connects to may get 
an exception or success. I do agree though that given controller would not have 
responded back with success the original renew request should be retried and 
most likely the scenario can be avoided.

If the above steps seems right , here are the advantages of this approach:

Advantage:
* Token generation/renewal will not involve zookeeper. I am not too worried 
about the load on zookeeper added due to this but it definitely seems more 
secure and follows the Hadoop model more closely. However zookeeper needs to be 
secure for lot of other things in kafka so not sure if this should really be a 
concern.

Disadvantage:
* We will have to add new APIs to support controller pushing tokens to brokers 
on top of the minimal APIs that are currently proposed. I like the publicly 
available APIs to be minimal and I like them to be something that we expect 
clients to use + this adds more development complexity. Overall this seems like 
a more philosophical thing so depending on who you ask they may see this as 
disadvantage or not. 
* We will also have to add APIs to support the bootstrapping case. What I mean 
is , when a new broker comes up it will have to get all delegation tokens from 
the controller so we will again need to add new APIs like getAllTokens. Again 
some of us may see that as disadvantage and some may not.
* In catastrophic failures where all brokers go down, the tokens will be lost 
even if servers are restarted as tokens are not persisted anywhere. Granted if 
something like this happens customer has bigger things to worry about but if 
they don't have to regenerate/redistribute tokens that is one less thing.

I don't see strong reasons to go one way or another so I would still like to go 
with zookeeper but don't really feel strongly about it. If you think I have 
mischaracterized what you were proposing feel free to add more details or list 
and other advantages/disadvantages.


> Kafka should be able to generate Hadoop delegation tokens
> ---------------------------------------------------------
>
>                 Key: KAFKA-1696
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1696
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Jay Kreps
>            Assignee: Parth Brahmbhatt
>
> For access from MapReduce/etc jobs run on behalf of a user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to