[
https://issues.apache.org/jira/browse/HDFS-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312087#comment-17312087
]
JiangHua Zhu commented on HDFS-15887:
-------------------------------------
In our cluster, when a checkpoint action occurs, lock will be used for more
than tens of seconds at this time:
2021-03-31 09:47:29,719 [674150524] - INFO [Edit log
tailer:FSNamesystemLock@261] - FSNamesystem write lock held for 83843 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1596)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:278)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:360)
Under normal circumstances, you only need to use lock for a few seconds:
2021-03-31 09:17:09,459 [672330264] - INFO [Edit log
tailer:FSNamesystemLock@261] - FSNamesystem write lock held for 5833 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1596)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:278)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:360)
> Make LogRoll and TailEdits execute in parallel
> ----------------------------------------------
>
> Key: HDFS-15887
> URL: https://issues.apache.org/jira/browse/HDFS-15887
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: JiangHua Zhu
> Assignee: JiangHua Zhu
> Priority: Major
> Labels: pull-request-available
> Attachments: edit_files.jpg
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In the EditLogTailer class, LogRoll and TailEdits are executed in a thread,
> and when a checkpoint occurs, it will compete with TailEdits for lock
> (FSNamesystem#cpLock).
> Usually, it takes a long time to execute checkpoint, which will cause the
> size of the generated edit log file to be relatively large.
> For example, here is an actual effect:
> The StandbyCheckpointer log is triggered as follows : edit_files.jpg
> 2021-03-11 09:18:42,513 [769071096]-INFO [Standby State
> Checkpointer:StandbyCheckpointer$CheckpointerThread@335]-Triggering
> checkpoint because there have been 5142154 txns since the last checkpoint,
> which exceeds the configured threshold 1000000
> When loading an edit log with a large amount of data, the processing time
> will be longer. We should make the edit log size as even as possible, which
> is good for the operation of the system.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]