[
https://issues.apache.org/jira/browse/HDDS-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-10488:
-------------------------------
Target Version/s: 2.0.0, 1.4.2 (was: 2.0.0, 1.4.2, 2.0.1)
> Datanode OOM due to run out of mmap handler
> --------------------------------------------
>
> Key: HDDS-10488
> URL: https://issues.apache.org/jira/browse/HDDS-10488
> Project: Apache Ozone
> Issue Type: Bug
> Affects Versions: 1.4.0
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0
>
>
> When I run command "yarn jar
> /**/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar
> TestDFSIO -Dfs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
> -Dfs.AbstractFileSystem.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzFs
> -Dfs.defaultFS=ofs://ozone1708515436 -Dozone.client.bytes.per.checksum=1KB
> -Dtest.build.data=ofs://ozone1708515436/s3v/testdfsio -write -nrFiles 64
> -fileSize 1024MB" on an installed Ozone cluster, several DN crashed due to
> OOM. Following is the exception stack,
> {code:java}
> 6:52:03.601 AM WARN KeyValueHandler Operation: ReadChunk , Trace ID: ,
> Message: java.io.IOException: Map failed , Result: IO_EXCEPTION ,
> StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> java.io.IOException: Map failed
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.wrapInStorageContainerException(ChunkUtils.java:471)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:226)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:260)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:194)
> at
> org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.readChunk(FilePerBlockStrategy.java:197)
> at
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.readChunk(ChunkManagerDispatcher.java:112)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:773)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:262)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:225)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:335)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:183)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:182)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:112)
> at
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:105)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> at
> org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.lambda$readData$5(ChunkUtils.java:264)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.lambda$readData$4(ChunkUtils.java:218)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.processFileExclusively(ChunkUtils.java:411)
> at
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:215)
> ... 24 more
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
> ... 28 more
> {code}
> In the "Dynamic libraries" section of file hs_err_pid1560151.log, there are
> 261425 mapped regions for different block files, for example
> "/hadoop-ozone/datanode/data/hdds/CID-303529f3-9f2b-4427-b389-6909971e960a/current/containerDir3/2005/chunks/113750153625618247.block"
> On OS level, the max mmap handler count is saved in file
> /proc/sys/vm/max_map_count, which has value 262144.
> In HDDS-7117, MappedByteBuffer is introduced to improve the chunk read
> performance.
> Property "ozone.chunk.read.mapped.buffer.threshold" with value 32KB is
> defined as a bar to indicate whether use MappedByteBuffer or normal
> ByteBuffer to read data.
> If read data length is less than "ozone.chunk.read.mapped.buffer.threshold",
> MappedByteBuffer should not be used, which is not enforce in current
> implementation.
> Here is the logs when debug log level is enabled in DN,
> {code:java}
> 2024-02-27 15:19:28,676 DEBUG
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
> mapped: offset=213565440, readLen=0, n=8192, class java.nio.DirectByteBufferR
> 2024-02-27 15:19:28,676 DEBUG
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
> mapped: offset=213565440, readLen=8192, n=8192, class
> java.nio.DirectByteBufferR
> ...
> 2024-02-27 15:19:28,676 DEBUG
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
> mapped: offset=213565440, readLen=327680, n=8192, class
> java.nio.DirectByteBufferR
> 2024-02-27 15:19:28,676 DEBUG
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
> mapped: offset=213565440, readLen=335872, n=8192, class
> java.nio.DirectByteBufferR
> 2024-02-27 15:19:28,676 DEBUG
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
> Read 344064 bytes starting at offset 213565440 from
> /hadoop-ozone/datanode/data/hdds/CID-303529f3-9f2b-4427-b389-6909971e960a/current/containerDir9/5007/chunks/113750153625622210.block
> {code}
> Due to this, DN is out of memory when the mmap handlers run out.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]