Rakesh R created HDFS-3382: ------------------------------ Summary: BookKeeperJournalManager: NN startup is failing, when tries to recoverUnfinalizedSegments() a bad inProgress_ ZNodes Key: HDFS-3382 URL: https://issues.apache.org/jira/browse/HDFS-3382 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Fix For: 0.24.0
Say, the InProgress_000X node is corrupted due to not writing the data(version, ledgerId, firstTxId) to this inProgress_000X znode. Namenode startup has the logic to recover all the unfinalized segments, here will try to read the segment and getting shutdown. {noformat} EditLogLedgerMetadata.java: static EditLogLedgerMetadata read(ZooKeeper zkc, String path) throws IOException, KeeperException.NoNodeException { byte[] data = zkc.getData(path, false, null); String[] parts = new String(data).split(";"); if (parts.length == 3) ....reading inprogress metadata else if (parts.length == 4) ....reading inprogress metadata else throw new IOException("Invalid ledger entry, " + new String(data)); } {noformat} Scenario:- Leaving bad inProgress_000X node ? Assume BKJM has created the inProgress_000X zNode and ZK is not available when trying to add the metadata. Now, inProgress_000X ends up with partial information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira