[ 
https://issues.apache.org/jira/browse/HDDS-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-10488:
-------------------------------
    Target Version/s: 2.0.0, 1.4.2  (was: 2.0.0, 1.4.2, 2.0.1)

> Datanode OOM due to run out of mmap handler 
> --------------------------------------------
>
>                 Key: HDDS-10488
>                 URL: https://issues.apache.org/jira/browse/HDDS-10488
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>
> When I run command "yarn jar 
> /**/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar 
> TestDFSIO -Dfs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem 
> -Dfs.AbstractFileSystem.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzFs 
> -Dfs.defaultFS=ofs://ozone1708515436 -Dozone.client.bytes.per.checksum=1KB 
> -Dtest.build.data=ofs://ozone1708515436/s3v/testdfsio -write -nrFiles 64 
> -fileSize 1024MB" on an installed Ozone cluster, several DN crashed due to 
> OOM. Following is the exception stack,
> {code:java}
> 6:52:03.601 AM  WARN  KeyValueHandler Operation: ReadChunk , Trace ID:  , 
> Message: java.io.IOException: Map failed , Result: IO_EXCEPTION , 
> StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.io.IOException: Map failed
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.wrapInStorageContainerException(ChunkUtils.java:471)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:226)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:260)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:194)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.readChunk(FilePerBlockStrategy.java:197)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.readChunk(ChunkManagerDispatcher.java:112)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:773)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:262)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:225)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:335)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:183)
>   at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:182)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:105)
>   at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
>   at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
>   at 
> org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Map failed
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.lambda$readData$5(ChunkUtils.java:264)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.lambda$readData$4(ChunkUtils.java:218)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.processFileExclusively(ChunkUtils.java:411)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:215)
>   ... 24 more
> Caused by: java.lang.OutOfMemoryError: Map failed
>   at sun.nio.ch.FileChannelImpl.map0(Native Method)
>   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
>   ... 28 more
> {code}
> In the "Dynamic libraries" section of file hs_err_pid1560151.log, there are 
> 261425 mapped regions for different block files, for example 
> "/hadoop-ozone/datanode/data/hdds/CID-303529f3-9f2b-4427-b389-6909971e960a/current/containerDir3/2005/chunks/113750153625618247.block"
> On OS level, the max mmap handler count is saved in file 
> /proc/sys/vm/max_map_count, which has value 262144.
> In HDDS-7117, MappedByteBuffer is introduced to improve the chunk read 
> performance. 
> Property "ozone.chunk.read.mapped.buffer.threshold" with value 32KB is 
> defined as a bar to indicate whether use MappedByteBuffer or normal 
> ByteBuffer to read data. 
> If read data length is less than "ozone.chunk.read.mapped.buffer.threshold", 
> MappedByteBuffer should not be used, which is not enforce in current 
> implementation. 
> Here is the logs when debug log level is enabled in DN,
> {code:java}
> 2024-02-27 15:19:28,676 DEBUG 
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
>  mapped: offset=213565440, readLen=0, n=8192, class java.nio.DirectByteBufferR
> 2024-02-27 15:19:28,676 DEBUG 
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
>  mapped: offset=213565440, readLen=8192, n=8192, class 
> java.nio.DirectByteBufferR
> ...
> 2024-02-27 15:19:28,676 DEBUG 
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
>  mapped: offset=213565440, readLen=327680, n=8192, class 
> java.nio.DirectByteBufferR
> 2024-02-27 15:19:28,676 DEBUG 
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
>  mapped: offset=213565440, readLen=335872, n=8192, class 
> java.nio.DirectByteBufferR
> 2024-02-27 15:19:28,676 DEBUG 
> [f22679a0-7e8c-4006-a6ab-874736e9c75a-ChunkReader-3]-org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils:
>  Read 344064 bytes starting at offset 213565440 from 
> /hadoop-ozone/datanode/data/hdds/CID-303529f3-9f2b-4427-b389-6909971e960a/current/containerDir9/5007/chunks/113750153625622210.block
> {code}
> Due to this, DN is out of memory when the mmap handlers run out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to