[
https://issues.apache.org/jira/browse/HDFS-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896055#comment-16896055
]
Stephen O'Donnell commented on HDFS-14557:
------------------------------------------
If the next and only other edit in the file is partial, then the problem still
occurs but the position will be set to the end of the file, eg this test sets
the layout version as before, writes and edit and then removes the last byte of
the file:
{code}
public void testScanCorruptEditLog2() throws Exception {
Configuration conf = new Configuration();
File editLog = new File(GenericTestUtils.getTempPath("testCorruptEditLog"));
LOG.debug("Creating test edit log file: " + editLog);
EditLogFileOutputStream elos = new EditLogFileOutputStream(conf,
editLog.getAbsoluteFile(), 8192);
// elos.create(NameNodeLayoutVersion.CURRENT_LAYOUT_VERSION);
elos.create(-27);
FSEditLogOp.OpInstanceCache cache = new FSEditLogOp.OpInstanceCache();
FSEditLogOp.MkdirOp mkdirOp = FSEditLogOp.MkdirOp.getInstance(cache);
mkdirOp.reset();
mkdirOp.setRpcCallId(123);
mkdirOp.setTransactionId(1);
mkdirOp.setInodeId(789L);
mkdirOp.setPath("/mydir");
PermissionStatus perms = PermissionStatus.createImmutable(
"myuser", "mygroup", FsPermission.createImmutable((short)0777));
mkdirOp.setPermissionStatus(perms);
elos.write(mkdirOp);
elos.setReadyToFlush();
elos.flushAndSync(false);
elos.close();
long fileLen = editLog.length();
LOG.info("Corrupting last edit in the file, by removing the last byte");
RandomAccessFile rwf = new RandomAccessFile(editLog, "rw");
rwf.setLength(fileLen - 1);
rwf.close();
FSEditLogLoader.EditLogValidation val =
EditLogFileInputStream.scanEditLog(editLog, 2, false);
}
{code}
This will give errors like:
{code}
2019-07-30 12:49:32,697 [Time-limited test] WARN namenode.FSImage
(FSEditLogLoader.java:scanEditLog(1287)) - After resync, position is 78
2019-07-30 12:49:32,697 [Time-limited test] WARN namenode.FSImage
(FSEditLogLoader.java:scanEditLog(1282)) - Caught exception after scanning
through 0 ops from
/Users/sodonnell/source/upstream_hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/testCorruptEditLog
while determining its valid length. Position was 78
java.io.IOException: Can't scan a pre-transactional edit log.
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:5264)
at
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:261)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1278)
at
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:345)
at
org.apache.hadoop.hdfs.server.namenode.TestEditLogFileInputStream.testScanCorruptEditLog2(TestEditLogFileInputStream.java:210)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
2019-07-30 12:49:32,697 [Time-limited test] WARN namenode.FSImage
(FSEditLogLoader.java:scanEditLog(1287)) - After resync, position is 78
2019-07-30 12:49:32,697 [Time-limited test] WARN namenode.FSImage
(FSEditLogLoader.java:scanEditLog(1282)) - Caught exception after scanning
through 0 ops from
/Users/sodonnell/source/upstream_hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/testCorruptEditLog
while determining its valid length. Position was 78
java.io.IOException: Can't scan a pre-transactional edit log.
{code}
However if the layout version is "correct" and the last edit is truncated, you
would get a EOF exception:
{code}
2019-07-30 12:45:52,148 [Time-limited test] WARN namenode.FSImage
(FSEditLogLoader.java:scanEditLog(1282)) - Caught exception after scanning
through 0 ops from
/Users/sodonnell/source/upstream_hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/testCorruptEditLog
while determining its valid length. Position was 8
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:5146)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.scanOp(FSEditLogOp.java:5095)
at
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:261)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1278)
{code}
> JournalNode error: Can't scan a pre-transactional edit log
> ----------------------------------------------------------
>
> Key: HDFS-14557
> URL: https://issues.apache.org/jira/browse/HDFS-14557
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.6.0
> Reporter: Wei-Chiu Chuang
> Priority: Major
>
> We saw the following error in JournalNodes a few times before.
> {noformat}
> 2016-09-22 12:44:24,505 WARN org.apache.hadoop.hdfs.server.namenode.FSImage:
> Caught exception after scanning through 0 ops from /data/1/dfs/current/ed
> its_inprogress_0000000000000661942 while determining its valid length.
> Position was 761856
> java.io.IOException: Can't scan a pre-transactional edit log.
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4592)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)
> at
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:193)
> at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:153)
> at
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90)
> {noformat}
> The edit file was corrupt, and one possible culprit of this error is a full
> disk. The JournalNode can't recovered and must be resync manually from other
> JournalNodes.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]