Rakesh R created HDDS-1687:
------------------------------
Summary: Datanode process shutdown due to OOME
Key: HDDS-1687
URL: https://issues.apache.org/jira/browse/HDDS-1687
Project: Hadoop Distributed Data Store
Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rakesh R
Attachments: baseline test - datanode error logs.0.5.0.rar
Ran Freon benchmark in a three node cluster and with more parallel writer
threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type
in worker nodes.
+Freon with the args:-+
--numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100
--numOfThreads=100
*DN-2* : Process got killed during the test, due to OOME
{code}
2019-06-13 00:48:11,976 ERROR
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating
with exit status 1:
a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory
/data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed.
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at
org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.<init>(BufferedWriteChannel.java:44)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.<init>(SegmentedRaftLogOutputStream.java:70)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234)
at java.lang.Thread.run(Thread.java:748)
{code}
*DN3* : Process got killed during the test, due to OOME. I could see lots of
NPE at the datanode logs.
{code}
2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
83232f1f-4469-4a4d-b369-c131c8432ae9: follower
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0,
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c:
follower responses installSnapshot Completed
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender:
83232f1f-4469-4a4d-b369-c131c8432ae9: follower
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0,
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender:
org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected
exception
java.lang.NullPointerException:
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c:
Previous TermIndex not found for firstIndex = 10062
at java.util.Objects.requireNonNull(Objects.java:290)
at
org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234)
at
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221)
at
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169)
at
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113)
at
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80)
at java.lang.Thread.run(Thread.java:748)
OOME log messages present in the *.out file.
Exception in thread
"org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3"
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68)
at
org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at
org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372)
at
org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394)
at
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97)
at java.lang.Thread.run(Thread.java:748)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]