[ 
https://issues.apache.org/jira/browse/HDFS-16387?focusedWorklogId=697232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-697232
 ]

ASF GitHub Bot logged work on HDFS-16387:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Dec/21 12:49
            Start Date: 16/Dec/21 12:49
    Worklog Time Spent: 10m 
      Work Description: jianghuazhu opened a new pull request #3807:
URL: https://github.com/apache/hadoop/pull/3807


   
   ### Description of PR
   When creating files through RPC, sometimes a deadlock is triggered.
   The purpose of this pr is to make it more secure.
   Details: HDFS-16387
   
   ### How was this patch tested?
   Use the existing test, that's ok.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 697232)
    Remaining Estimate: 0h
            Time Spent: 10m

> [FGL]Access to Create File is more secure
> -----------------------------------------
>
>                 Key: HDFS-16387
>                 URL: https://issues.apache.org/jira/browse/HDFS-16387
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: Fine-Grained Locking
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I introduced this patch, and tried to use NNThroughputBenchmark to 
> verify the create function, for example:
> ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
> hdfs://xxxx -op create -threads 50 -files 2000000
> Run multiple times, there may be an error once.
> I found that sometimes deadlocks occur, such as:
> Found one Java-level deadlock:
> =============================
> "CacheReplicationMonitor(72357231)":
> waiting for ownable synchronizer 0x00007f6a74c1aa50, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "IPC Server handler 49 on 8020"
> "IPC Server handler 49 on 8020":
> waiting for ownable synchronizer 0x00007f6a74d14ec8, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "IPC Server handler 24 on 8020"
> "IPC Server handler 24 on 8020":
> waiting for ownable synchronizer 0x00007f69348ba648, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "IPC Server handler 49 on 8020"
> Java stack information for the threads listed above:
> ===================================================
> "CacheReplicationMonitor(72357231)":
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00007f6a74c1aa50> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.doLock(FSNamesystemLock.java:386)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeLock(FSNamesystemLock.java:248)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeLock(FSNamesystem.java:1587)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.rescan(CacheReplicationMonitor.java:288)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:189)
> "IPC Server handler 49 on 8020":
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00007f6a74d14ec8> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164)
> at 
> org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createMissingDirs(FSDirMkdirOp.java:92)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:372)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687)
> "IPC Server handler 24 on 8020":
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00007f69348ba648> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164)
> at 
> org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.addFile(FSDirWriteFileOp.java:498)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:375)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687)
> Found 1 deadlock.
> I found that in FSDirWriteFileOp#startFile(), INodeMap#latchWriteLock() is 
> used twice, and there is a possibility of deadlock conflict.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to