[
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDFS-9249:
----------------------------------
Attachment: HDFS-9249.005.patch
Thanks [~yzhangal]!
Rev 5 is based on Yongjun's suggestions. The improved code looks cleaner and
handles more cases.
> NPE thrown if an IOException is thrown in NameNode.<init>
> ---------------------------------------------------------
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.1
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Minor
> Labels: supportability
> Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch,
> HDFS-9249.003.patch, HDFS-9249.004.patch, HDFS-9249.005.patch
>
>
> This issue was found when running test case
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not
> due to the test case.
> Looks like an IOException was thrown in
> try {
> initializeGenericKeys(conf, nsId, namenodeId);
> initialize(conf);
> try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
> } finally {
> haContext.writeUnlock();
> }
> causing the namenode to stop, but the namesystem was not yet properly
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what
> caused the IOException, I suggest make this a supportability JIRA to log the
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:827)
> at
> org.apache.hadoop.hdfs.server.namenode.BackupNode.<init>(BackupNode.java:89)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of
> syncs: 4 SyncTimes(ms): 2 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted,
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000001
> ->
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000001-0000000000000000003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000001
> ->
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000001-0000000000000000003
> 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor
> (CacheReplicationMonitor.java:run(169)) - Shutting down
> CacheReplicationMonitor
> 2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping
> server on 37835
> 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC
> Server listener on 37835
> 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC
> Server Responder
> 2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager
> (BlockManager.java:run(3781)) - Stopping ReplicationMonitor.
> 2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager
> (DecommissionManager.java:run(78)) - Monitor interrupted:
> java.lang.InterruptedException: sleep interrupted
> 2015-10-14 19:45:07,844 INFO namenode.FSNamesystem
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for
> active state
> 2015-10-14 19:45:07,845 INFO namenode.FSNamesystem
> (FSNamesystem.java:stopStandbyServices(1386)) - Stopping services started for
> standby state
> 2015-10-14 19:45:07,848 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped
> HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)