[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xudong Cao updated HDFS-14693: ------------------------------ Description: In a production environment, there may be some differences in each JouranlNode (e.g. network condition, disk condition, and so on). For example, If a JN's network is much worse than other JNs, then the time taken by the NN to write this JN will be much greater than other JNs, in this case, it will cause the IPC Logger thread corresponding to this JN to have many pending edits, when the pending edits exceeds the maximum limit (default 10MB), the new edits about to write to this JN will be silently dropped, and will result gaps in the editlog segment, which causing this JN and NN repeatedly reporting the following errors: {code:java} org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1904164873 expecting nextTxId=1904164871{code} Unfortunately, the above error message can not help us quickly find the root cause, so it's better to add a warning log to tell us the really reason, like this: {code:java} 2019-08-02 04:55:05,879 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits size:10224, will silently drop 174 bytes of edits!{code} This is just a very small improvement. > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > ----------------------------------------------------------------------------------- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Xudong Cao > Priority: Minor > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org