[ 
https://issues.apache.org/jira/browse/HADOOP-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049589#comment-17049589
 ] 

Xiaoyu Yao commented on HADOOP-16828:
-------------------------------------

Thanks [~fengnanli] for reporting the issue and provide the patch. The patch 
LGTM overall. The performance improvement is impressive. Here are a few minor 
comments.

ZKDelegationTokenSecretManager.java

Line:100 NIT: can we add a token as part of the prefix for the new key?
i.e. "token.seqnum.batch.size"

Line 559: getDelegationTokenSeqNum() this function needs to be changed as 
the delTokenSeqCounter.getCount() will be updated in batch. We should return 
currentSeqNum here instead.

TestZKDelegationTokenSecretManager.java
As shown in the test, if the batch size is large, say 1000, this might leave 
holes in the sequence number
when KMS failover. It might be an acceptable tradeoff. 

Please ensure the DTSM instances (tm1, tm2) are properly destroyed after the 
test by calling verifyDestroy(). 


> Zookeeper Delegation Token Manager fetch sequence number by batch
> -----------------------------------------------------------------
>
>                 Key: HADOOP-16828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16828
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Fengnan Li
>            Assignee: Fengnan Li
>            Priority: Major
>         Attachments: HADOOP-16828.001.patch, Screen Shot 2020-01-25 at 
> 2.25.06 PM.png, Screen Shot 2020-01-25 at 2.25.16 PM.png, Screen Shot 
> 2020-01-25 at 2.25.24 PM.png
>
>
> Currently in ZKDelegationTokenSecretManager.java the seq number is 
> incremented by 1 each time there is a request for creating new token. This 
> will need to send traffic to Zookeeper server. With multiple managers 
> running, there is data contention going on. Also, since the current logic of 
> incrementing is using tryAndSet which is optimistic concurrency control 
> without locking. This data contention is having performance degradation when 
> the secret manager are under volume of traffic.
> The change here is to fetching this seq number by batch instead of 1, which 
> will reduce the traffic sent to ZK and make many operations inside ZK secret 
> manager's memory.
> After putting this into production we saw huge improvement to the RPC 
> processing latency of get delegationtoken calls. Also, since ZK takes less 
> traffic in this way. Other write calls, like renew and cancel delegation 
> tokens are benefiting from this change.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to