[ 
https://issues.apache.org/jira/browse/HDFS-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898444#comment-16898444
 ] 

Wei-Chiu Chuang commented on HDFS-14557:
----------------------------------------

Fix & test looks really good!

Would the error "Header value is -1 indicating it was never written" too 
cryptic? I didn't understand it the first time I read it. Can we make it 
something that an ordinary administrator can understand. How do we make this 
error message more descriptive, like "the edit log file xxx will be sidelined 
to file name xxx.empty." 

Another question: if the JN indeed runs out of disk, sidelining the edit log 
file is not going to help much, right? Unless administrator steps in and clean 
up the space, JN will not be able to return back to a good state.

> JournalNode error: Can't scan a pre-transactional edit log
> ----------------------------------------------------------
>
>                 Key: HDFS-14557
>                 URL: https://issues.apache.org/jira/browse/HDFS-14557
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.6.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14557.001.patch
>
>
> We saw the following error in JournalNodes a few times before.
> {noformat}
> 2016-09-22 12:44:24,505 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Caught exception after scanning through 0 ops from /data/1/dfs/current/ed
> its_inprogress_0000000000000661942 while determining its valid length. 
> Position was 761856
> java.io.IOException: Can't scan a pre-transactional edit log.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4592)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)
> at 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:193)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:153)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90)
> {noformat}
> The edit file was corrupt, and one possible culprit of this error is a full 
> disk. The JournalNode can't recovered and must be resync manually from other 
> JournalNodes. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to