[
https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764809#comment-17764809
]
ASF GitHub Bot commented on HADOOP-18851:
-----------------------------------------
hadoop-yetus commented on PR #6001:
URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1718056474
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 58s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 1s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include
any new or modified tests. Please justify why no new tests are needed for this
patch. Also please list what manual steps were performed to verify this patch.
|
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 44m 28s | | trunk passed |
| +1 :green_heart: | compile | 17m 7s | | trunk passed with JDK
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | compile | 15m 59s | | trunk passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | checkstyle | 1m 23s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 47s | | trunk passed |
| +1 :green_heart: | javadoc | 1m 22s | | trunk passed with JDK
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 59s | | trunk passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | spotbugs | 2m 50s | | trunk passed |
| +1 :green_heart: | shadedclient | 37m 42s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 0s | | the patch passed |
| +1 :green_heart: | compile | 17m 21s | | the patch passed with JDK
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javac | 17m 21s | | the patch passed |
| +1 :green_heart: | compile | 16m 16s | | the patch passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | javac | 16m 16s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 1m 13s | | the patch passed |
| +1 :green_heart: | mvnsite | 1m 41s | | the patch passed |
| +1 :green_heart: | javadoc | 1m 13s | | the patch passed with JDK
Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 |
| +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| +1 :green_heart: | spotbugs | 2m 44s | | the patch passed |
| +1 :green_heart: | shadedclient | 38m 39s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 19m 38s | | hadoop-common in the patch
passed. |
| +1 :green_heart: | asflicense | 1m 5s | | The patch does not
generate ASF License warnings. |
| | | 230m 30s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/7/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/6001 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux a2e40121feb1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / d5eded867b72eb6d5018ea63ee20bde882f21b29 |
| Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/7/testReport/ |
| Max. process+thread count | 1278 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U:
hadoop-common-project/hadoop-common |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/7/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
> AbstractDelegationTokenSecretManager- Performance improvement by optimising
> the synchronization context
> -------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-18851
> URL: https://issues.apache.org/jira/browse/HADOOP-18851
> Project: Hadoop Common
> Issue Type: Task
> Reporter: Vikas Kumar
> Assignee: Vikas Kumar
> Priority: Major
> Labels: pull-request-available
> Attachments:
> 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot
> 2023-08-16 at 5.36.57 PM.png
>
>
> *Context:*
> KMS depends on hadoop-common for DT management. Recently we were analysing
> one performance issue and following is out findings:
> # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at
> following:
> ## *AbstractDelegationTokenSecretManager.verifyToken()*
> ## *AbstractDelegationTokenSecretManager.createPassword()*
> # And then process crashed.
>
> {code:java}
> http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X00007F075C157800NATIVE ID :
> 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474)
> - waiting to lock <0x00000005f2f545e8> (a
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396)
> at {code}
> All the 199 out of 200 were blocked at above point.
> And the lock they are waiting for is acquired by a thread that was trying to
> createPassword and publishing the same on ZK.
>
> {code:java}
> stackTrace:
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598)
> - locked <0x0000000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235)
> at
> org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398)
> at
> org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
> at
> org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382)
> at
> org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358)
> at
> org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36)
> at
> org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201)
> at
> org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116)
> at
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586)
> at
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601)
> at
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402)
> - locked <0x00000005f2f545e8> (a
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager)
> at
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48)
> at org.apache.hadoop.security.token.Token.<init>(Token.java:67)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183)
> {code}
> We can say that this thread is slow and has blocked remaining all. But
> following is my observation:
>
> # verifyToken() and createPaswword() has been synchronized because one is
> reading the tokenMap and another is updating the map. If it's only to protect
> the map, then can't we simply use ConcurrentHashMap and remove the
> "synchronized" keyword. Because due to this, all reader threads ( to
> verifyToken()) are also blocking each other.
> # IN HA env, It is recommended to use ZK to store DTs. We know that
> CuratorFramework is thread safe.
> ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum() only requires
> to be protected from concurrent execution and it should be protected using
> some other locks instead of "this".
> # With these changes, verifyToken() and createPaswword() will not block each
> other. It will be blocked only at the time of updating the map.
> # Similarly other methods can also be considered but these two are critical.
> I made these changes on my local and got the significant performance
> improvement.
> I request community to provide their input and if we agree, I can provide the
> patch. Please let me know if any other details are required.
> Thanks.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]