[ 
https://issues.apache.org/jira/browse/HDFS-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

snodawn updated HDFS-15544:
---------------------------
    Attachment: HDFS-15544.001.patch

> Standby namenode EditLogTailerThread shouldn't aquire a lock interruptibly 
> when do tail edits
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15544
>                 URL: https://issues.apache.org/jira/browse/HDFS-15544
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: snodawn
>            Priority: Major
>         Attachments: HDFS-15544.001.patch
>
>
> In my practice, active namenode sometimes holds a long time write lock in 
> rollEditLog
> {code:java}
>  Longest write-lock held at 2020-08-27 12:59:30,773+0800 for 66067ms via 
> java.lang.Thread.getStackTrace(Thread.java:1559) 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:283)
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:258)
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1610)
>  
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4667)
>  
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1292)
>  
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:146){code}
> because standby namenode may not triggerActiveLogRoll()  as set in 
> dfs.ha.log-roll.period after its last checkpoint, which may lead to a large 
> size editlog for active namenode to roll.
>  
> When try to do tail edits, standby namenode EditLogTailerThread acquire the 
> same lock as it do in checkpoint thread, but checkpoint thread may paste a 
> log of time to save fsimage file (in my practice, 4 minutes) , so 
> triggerActiveLogRoll() in EditLogTailerThread will not be called as set in 
> dfs.ha.log-roll.period.
> I propose that EditLogTailerThread shouldn't acquire a lock by using 
> cpLockInterruptibly(), trylock() is enough.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to