[
https://issues.apache.org/jira/browse/HADOOP-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049589#comment-17049589
]
Xiaoyu Yao commented on HADOOP-16828:
-------------------------------------
Thanks [~fengnanli] for reporting the issue and provide the patch. The patch
LGTM overall. The performance improvement is impressive. Here are a few minor
comments.
ZKDelegationTokenSecretManager.java
Line:100 NIT: can we add a token as part of the prefix for the new key?
i.e. "token.seqnum.batch.size"
Line 559: getDelegationTokenSeqNum() this function needs to be changed as
the delTokenSeqCounter.getCount() will be updated in batch. We should return
currentSeqNum here instead.
TestZKDelegationTokenSecretManager.java
As shown in the test, if the batch size is large, say 1000, this might leave
holes in the sequence number
when KMS failover. It might be an acceptable tradeoff.
Please ensure the DTSM instances (tm1, tm2) are properly destroyed after the
test by calling verifyDestroy().
> Zookeeper Delegation Token Manager fetch sequence number by batch
> -----------------------------------------------------------------
>
> Key: HADOOP-16828
> URL: https://issues.apache.org/jira/browse/HADOOP-16828
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Fengnan Li
> Assignee: Fengnan Li
> Priority: Major
> Attachments: HADOOP-16828.001.patch, Screen Shot 2020-01-25 at
> 2.25.06 PM.png, Screen Shot 2020-01-25 at 2.25.16 PM.png, Screen Shot
> 2020-01-25 at 2.25.24 PM.png
>
>
> Currently in ZKDelegationTokenSecretManager.java the seq number is
> incremented by 1 each time there is a request for creating new token. This
> will need to send traffic to Zookeeper server. With multiple managers
> running, there is data contention going on. Also, since the current logic of
> incrementing is using tryAndSet which is optimistic concurrency control
> without locking. This data contention is having performance degradation when
> the secret manager are under volume of traffic.
> The change here is to fetching this seq number by batch instead of 1, which
> will reduce the traffic sent to ZK and make many operations inside ZK secret
> manager's memory.
> After putting this into production we saw huge improvement to the RPC
> processing latency of get delegationtoken calls. Also, since ZK takes less
> traffic in this way. Other write calls, like renew and cancel delegation
> tokens are benefiting from this change.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]