[ 
https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844971#comment-17844971
 ] 

ASF GitHub Bot commented on HADOOP-18851:
-----------------------------------------

vikaskr22 commented on PR #6803:
URL: https://github.com/apache/hadoop/pull/6803#issuecomment-2102594497

   Hi @Hexiaoqiao , technically I didn't see or observe any issues till now but 
@ChenSammi has some concerns related to concurrency in other sub classes of 
AbstractDelegationTokenSecretManager.java .
   
   I had done testing with  RangerKMS  and there 
ZKDelegationTokenSecretManager.java is used as implementing class. That means, 
our test suite only verifies ZK specific cases. There are other implementation 
that is for HDFS federation, YARN etc. I 
   don't have any test suite that ensures concurrency stability for other 
implementations as well. 
   
   @ChenSammi point is, as part of last commit, I removed "synchronization" 
from super class and took care of concurrency in ZK sub class. So I was 
breaking concurrency at API level and all the implementing classes needs to 
take care of that. and at the same time, I don't have test suite internally 
that verifies anything except Ranger KMS DT usage.
   
   One more point she raised  that, what about any other subclass outside this 
repo? They might have written their code in assumption that lock has been 
acquired at API level. 
   
   Although I went through the source code of other implementing classes, and 
for me it still seems good. But due to lack of automated test suite and 
concurrency nature, I agreed to implement using Lock API.
   
   Using ReadWriteLock API, at least I am able to unblock multiple reader 
threads, like verifyToken(). Now verifyToken() can be invoked by multiple 
threads due to ReadLock nature. And write lock will ensure thread safety at API 
level.
   
   @ChenSammi , Please add your input as well in case I missed anything. Thanks.




> Performance improvement for DelegationTokenSecretManager.
> ---------------------------------------------------------
>
>                 Key: HADOOP-18851
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18851
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: common
>    Affects Versions: 3.4.0
>            Reporter: Vikas Kumar
>            Assignee: Vikas Kumar
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>         Attachments: 
> 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot 
> 2023-08-16 at 5.36.57 PM.png
>
>
> *Context:*
> KMS depends on hadoop-common for DT management. Recently we were analysing 
> one performance issue and following is out findings:
>  # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at 
> following:
>  ## *AbstractDelegationTokenSecretManager.verifyToken()*
>  ## *AbstractDelegationTokenSecretManager.createPassword()* 
>  # And then process crashed.
>  
> {code:java}
> http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X00007F075C157800NATIVE ID : 
> 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474)
> - waiting to lock <0x00000005f2f545e8> (a 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396)
> at  {code}
> All the 199 out of 200 were blocked at above point.
> And the lock they are waiting for is acquired by a thread that was trying to 
> createPassword and publishing the same on ZK.
>  
> {code:java}
> stackTrace:
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598)
> - locked <0x0000000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235)
> at 
> org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398)
> at 
> org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
> at 
> org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382)
> at 
> org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358)
> at 
> org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36)
> at 
> org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201)
> at 
> org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116)
> at 
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586)
> at 
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402)
> - locked <0x00000005f2f545e8> (a 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48)
> at org.apache.hadoop.security.token.Token.<init>(Token.java:67)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183)
>  {code}
> We can say that this thread is slow and has blocked remaining all. But 
> following is my observation:
>  
>  # verifyToken() and createPaswword() has been synchronized because one is 
> reading the tokenMap and another is updating the map. If it's only to protect 
> the map, then can't we simply use ConcurrentHashMap and remove the 
> "synchronized" keyword. Because due to this, all reader threads ( to 
> verifyToken()) are also blocking each other.
>  # IN HA env, It is recommended to use ZK to store DTs. We know that 
> CuratorFramework is thread safe.  
> ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum() only requires 
> to be protected from concurrent execution and it should be protected using 
> some other locks instead of "this". 
>  # With these changes, verifyToken() and createPaswword() will not block each 
> other. It will be blocked only at the time of updating the map.
>  # Similarly other methods can also be considered but these two are critical.
> I made these changes on my local and got the significant performance 
> improvement. 
> I request community to provide their input and if we agree, I can provide the 
> patch. Please let me know if any other details are required.
> Thanks. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to