[ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
------------------------------
    Description: 
In a production environment, there may be some differences in each JouranlNode 
(e.g. network condition, disk condition, and so on). For example, If a JN's 
network is much worse than other JNs, then the time taken by the NN to write 
this JN will be much greater than other JNs, in this case, it will cause the 
IPC Logger thread corresponding to this JN to have many pending edits, when the 
pending edits exceeds the maximum limit (default 10MB), the new edits about to 
write to this JN will be silently dropped, and will result gaps in the editlog 
segment, which causing this JN and NN repeatedly reporting the following errors:
 
{code:java}
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write 
txid 1904164873 expecting nextTxId=1904164871{code}
 

Unfortunately, the above error message can not help us quickly find the root 
cause, so it's better to add a warning log to tell us the really reason, like 
this:

 
{code:java}
2019-08-02 04:55:05,879 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
size:10224, will silently drop 174 bytes of edits!{code}
 

This is just a very small improvement.

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-14693
>                 URL: https://issues.apache.org/jira/browse/HDFS-14693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Xudong Cao
>            Priority: Minor
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors:
>  
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  
> Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this:
>  
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  
> This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to