[
https://issues.apache.org/jira/browse/HDDS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arpit Agarwal resolved HDDS-543.
--------------------------------
Resolution: Done
This should be fixed as part of the restart fixes per [~msingh].
> datanode service start throwing NullPointerException
> ----------------------------------------------------
>
> Key: HDDS-543
> URL: https://issues.apache.org/jira/browse/HDDS-543
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Nilotpal Nandi
> Assignee: Mukul Kumar Singh
> Priority: Critical
> Attachments: all-node-ozone-logs-1537784799.tar.gz
>
>
> steps taken :
> -------------------
> # Stopped one of the nodes in 6 node cluster while write operation was
> running.
> # Tried to start datanode after sometime. Datanode failed to start
>
> ozone version :
> ---------------------
>
> {noformat}
> Source code repository [email protected]:apache/hadoop.git -r
> 968082ffa5d9e50ed8538f653c375edd1b8feea5
> Compiled by elek on 2018-09-19T20:57Z
> Compiled with protoc 2.5.0
> From source with checksum efbdeabb5670d69d9efde85846e4ee98
> Using HDDS 0.2.1-alpha
> Source code repository [email protected]:apache/hadoop.git -r
> 968082ffa5d9e50ed8538f653c375edd1b8feea5
> Compiled by elek on 2018-09-19T20:56Z
> Compiled with protoc 2.5.0
> From source with checksum 8bf78cff4b73c95d486da5b21053ef
>
> {noformat}
>
> console log :
> --------------------
>
> {noformat}
> 2018-09-24 09:53:54,913 INFO impl.RaftServerImpl:
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0:
> [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858,
> 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858,
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0
> Sep 24, 2018 9:53:55 AM
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25
> AM'
> 2018-09-24 09:53:55,968 [main] INFO - Listening HDDS REST traffic on
> /0.0.0.0:9880
> 2018-09-24 09:53:55,969 [main] INFO - Started plug-in
> org.apache.hadoop.ozone.web.OzoneHddsDatanodeService@2fa3be26
> 2018-09-24 09:53:55,975 [Datanode State Machine Thread - 0] INFO - Attempting
> to start container services.
> 2018-09-24 09:53:56,171 [Datanode State Machine Thread - 0] INFO - Starting
> XceiverServerRatis 9348f4cb-0bd3-41e7-b1c6-c22463a20c09 at port 9858
> 2018-09-24 09:53:56,193 [Datanode State Machine Thread - 0] ERROR - Unable to
> start the DatanodeState Machine
> java.io.IOException: Premature EOF from inputStream
> at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100)
> at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250)
> at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155)
> at
> org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128)
> at
> org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110)
> at
> org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:132)
> at
> org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:110)
> at
> org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:155)
> at
> org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLog.java:123)
> at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162)
> at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:110)
> at
> org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:106)
> at
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$0(RaftServerProxy.java:191)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> at
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> 2018-09-24 09:53:56,195 [main] ERROR - Exception in HddsDatanodeService.
> java.lang.NullPointerException
> at
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:332)
> at
> org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:191)
> at
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:250)
> 2018-09-24 09:53:56,198 INFO util.ExitUtil: Exiting with status 1:
> java.lang.NullPointerException
> 2018-09-24 09:53:56,201 WARN fs.CachingGetSpaceUsed: Thread Interrupted
> waiting to refresh disk information: sleep interrupted
> 2018-09-24 09:53:56,201 [shutdown-hook-0] INFO - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down HddsDatanodeService at
> ctr-e138-1518143905142-468367-01-000003.hwx.site/172.27.23.161
>
> {noformat}
>
> datanode.log
> ----------------------
> {noformat}
> 2018-09-24 09:42:41,305 ERROR org.apache.ratis.server.impl.RaftServerImpl:
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: applyTransaction failed for index:2
> proto:(t:2, i:2)SMLOGENTRY, client-DBCE827E62E6, cid=1
> 2018-09-24 09:42:41,307 ERROR
> org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit
> status 2: StateMachineUpdater-9348f4cb-0bd3-41e7-b1c6-c22463a20c09: the
> StateMachineUpdater hits Throwable
> java.lang.NullPointerException
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.handleChunkCommit(ContainerStateMachine.java:561)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.executeContainerCommand(ContainerStateMachine.java:615)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:394)
> at
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1150)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:148)
> at java.lang.Thread.run(Thread.java:745)
> 2018-09-24 09:42:41,312 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread
> Interrupted waiting to refresh disk information: sleep interrupted
> 2018-09-24 09:53:38,901 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
> Loaded properties from hadoop-metrics2.properties
> 2018-09-24 09:53:39,013 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot
> period at 10 second(s).
> 2018-09-24 09:53:39,013 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HddsDatanode metrics
> system started
> 2018-09-24 09:53:39,056 INFO
> org.apache.hadoop.conf.Configuration.deprecation: No unit for
> ozone.scm.heartbeat.rpc-timeout(1000) assuming MILLISECONDS
> 2018-09-24 09:53:39,069 INFO
> org.apache.hadoop.conf.Configuration.deprecation: No unit for
> hdds.write.lock.reporting.threshold.ms(5000) assuming MILLISECONDS
> 2018-09-24 09:53:39,069 INFO
> org.apache.hadoop.conf.Configuration.deprecation: No unit for
> hdds.lock.suppress.warning.interval.ms(10000) assuming MILLISECONDS
> 2018-09-24 09:53:39,751 WARN org.apache.hadoop.hdds.scm.HddsServerUtil:
> Storage directory for Ratis is not configured.Mapping Ratis storage under
> /data/disk1/ozone/meta/ratis. It is a good idea to map this to an SSD disk.
> 2018-09-24 09:53:39,783 INFO org.apache.ratis.server.impl.RaftServerProxy:
> raft.rpc.type = GRPC (default)
> 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.GrpcConfigKeys$Server:
> raft.grpc.server.port = 9858 (custom)
> 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.RaftGRpcService:
> raft.grpc.message.size.max = 167772160 (custom)
> 2018-09-24 09:53:39,795 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.appender.buffer.capacity = 134217728 (custom)
> 2018-09-24 09:53:39,795 INFO org.apache.ratis.grpc.RaftGRpcService:
> raft.grpc.flow.control.window = 1MB (=1048576) (default)
> 2018-09-24 09:53:39,796 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.request.timeout = 3000 ms (custom)
> 2018-09-24 09:53:39,835 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom)
> 2018-09-24 09:53:39,836 INFO org.apache.ratis.server.impl.RaftServerProxy:
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: found a subdirectory
> /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d
> 2018-09-24 09:53:39,838 INFO org.apache.ratis.server.impl.RaftServerProxy:
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: addNew group-85804B25606D:[] returns
> group-85804B25606D:java.util.concurrent.CompletableFuture@74e28667[Not
> completed]
> 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.timeout.min = 1 s (custom)
> 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.timeout.max = 1200 ms (custom)
> 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpcslowness.timeout = 120 s (custom)
> 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: ConfigurationManager, init=-1: [],
> old=null, confs=<EMPTY_MAP>
> 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom)
> 2018-09-24 09:53:39,873 INFO
> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on
> /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d/in_use.lock
> acquired by nodename [email protected]
> 2018-09-24 09:53:39,878 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.leader.election.timeout = 120 s (custom)
> 2018-09-24 09:53:39,881 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.use.memory = false (default)
> 2018-09-24 09:53:39,882 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.appender.buffer.capacity = 134217728 (custom)
> 2018-09-24 09:53:39,883 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.size.max = 1073741824 (custom)
> 2018-09-24 09:53:39,886 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.cache.num.max = 2 (custom)
> 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.storage.RaftLogWorker:
> new 9348f4cb-0bd3-41e7-b1c6-c22463a20c09-RaftLogWorker for Storage Directory
> /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d
> 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.size.max = 1073741824 (custom)
> 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.preallocated.size = 134217728 (custom)
> 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.write.buffer.size = 33554432 (custom)
> 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.force.sync.num = 128 (default)
> 2018-09-24 09:53:39,925 INFO org.apache.ratis.server.impl.RaftServerImpl:
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0:
> [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858,
> 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858,
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0
> 2018-09-24 09:53:41,058 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 1: java.lang.NullPointerException
> 2018-09-24 09:53:41,065 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread
> Interrupted waiting to refresh disk information: sleep interrupted{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]