Sammi Chen created HDDS-10488:
---------------------------------

             Summary: Datanode OOO due to run out of mmap handler 
                 Key: HDDS-10488
                 URL: https://issues.apache.org/jira/browse/HDDS-10488
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Sammi Chen
            Assignee: Sammi Chen


When I run command "yarn jar 
/**/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO 
-Dfs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem 
-Dfs.AbstractFileSystem.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzFs 
-Dfs.defaultFS=ofs://ozone1708515436 -Dozone.client.bytes.per.checksum=1KB  
-Dtest.build.data=ofs://ozone1708515436/s3v/testdfsio -write -nrFiles 64 
-fileSize 1024MB" on an installed Ozone cluster, several DN crashed due to OOM. 
Following is the exception stack,

{code:java}
6:52:03.601 AM  WARN  KeyValueHandler Operation: ReadChunk , Trace ID:  , 
Message: java.io.IOException: Map failed , Result: IO_EXCEPTION , 
StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
java.io.IOException: Map failed
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.wrapInStorageContainerException(ChunkUtils.java:471)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:226)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:260)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:194)
  at 
org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.readChunk(FilePerBlockStrategy.java:197)
  at 
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.readChunk(ChunkManagerDispatcher.java:112)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:773)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:262)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:225)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:335)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:183)
  at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:182)
  at 
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:112)
  at 
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:105)
  at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
  at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
  at 
org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Map failed
  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.lambda$readData$5(ChunkUtils.java:264)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.lambda$readData$4(ChunkUtils.java:218)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.processFileExclusively(ChunkUtils.java:411)
  at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:215)
  ... 24 more
Caused by: java.lang.OutOfMemoryError: Map failed
  at sun.nio.ch.FileChannelImpl.map0(Native Method)
  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
  ... 28 more
{code}

In the "Dynamic libraries" section of file hs_err_pid1560151.log,  there are 
261425 mapped regions for different block files, for example 
"/hadoop-ozone/datanode/data/hdds/CID-303529f3-9f2b-4427-b389-6909971e960a/current/containerDir3/2005/chunks/113750153625618247.block"
 

On OS level, the max mmap handler count is saved in file 
/proc/sys/vm/max_map_count, which has value 262144. 

In HDDS-7117, MappedByteBuffer is introduced to improve the chunk read 
performance. 
Property "ozone.chunk.read.mapped.buffer.threshold" with value 32KB is defined 
as a bar to indicate whether use MappedByteBuffer or normal ByteBuffer to read 
data. 
If read data length is less than "ozone.chunk.read.mapped.buffer.threshold", 
MappedByteBuffer should not be used, which is not enforce in current 
implementation. Due to this,  DN is out of memory when the mmap handler run 
out. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to