[
https://issues.apache.org/jira/browse/HDFS-16387?focusedWorklogId=697680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-697680
]
ASF GitHub Bot logged work on HDFS-16387:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Dec/21 02:00
Start Date: 17/Dec/21 02:00
Worklog Time Spent: 10m
Work Description: jianghuazhu commented on pull request #3807:
URL: https://github.com/apache/hadoop/pull/3807#issuecomment-996373026
There are some unit tests that failed, but only temporarily.
Can you help review this pr @xinglin @prasad-acit .
Thank you very much.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 697680)
Time Spent: 0.5h (was: 20m)
> [FGL]Access to Create File is more secure
> -----------------------------------------
>
> Key: HDFS-16387
> URL: https://issues.apache.org/jira/browse/HDFS-16387
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: Fine-Grained Locking
> Reporter: JiangHua Zhu
> Assignee: JiangHua Zhu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When I introduced this patch, and tried to use NNThroughputBenchmark to
> verify the create function, for example:
> ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs
> hdfs://xxxx -op create -threads 50 -files 2000000
> Run multiple times, there may be an error once.
> I found that sometimes deadlocks occur, such as:
> Found one Java-level deadlock:
> =============================
> "CacheReplicationMonitor(72357231)":
> waiting for ownable synchronizer 0x00007f6a74c1aa50, (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "IPC Server handler 49 on 8020"
> "IPC Server handler 49 on 8020":
> waiting for ownable synchronizer 0x00007f6a74d14ec8, (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "IPC Server handler 24 on 8020"
> "IPC Server handler 24 on 8020":
> waiting for ownable synchronizer 0x00007f69348ba648, (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "IPC Server handler 49 on 8020"
> Java stack information for the threads listed above:
> ===================================================
> "CacheReplicationMonitor(72357231)":
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00007f6a74c1aa50> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.doLock(FSNamesystemLock.java:386)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeLock(FSNamesystemLock.java:248)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeLock(FSNamesystem.java:1587)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.rescan(CacheReplicationMonitor.java:288)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:189)
> "IPC Server handler 49 on 8020":
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00007f6a74d14ec8> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164)
> at
> org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createMissingDirs(FSDirMkdirOp.java:92)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:372)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687)
> "IPC Server handler 24 on 8020":
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x00007f69348ba648> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164)
> at
> org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.addFile(FSDirWriteFileOp.java:498)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:375)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687)
> Found 1 deadlock.
> I found that in FSDirWriteFileOp#startFile(), INodeMap#latchWriteLock() is
> used twice, and there is a possibility of deadlock conflict.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]