[
https://issues.apache.org/jira/browse/FLINK-33178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774858#comment-17774858
]
Piotr Nowojski commented on FLINK-33178:
----------------------------------------
I have a feeling this ticket is a duplicate of FLINK-11579 and is a "known
non-FLink issue".
[~iemre], can you try using OpenSSL instead of JDK's ?
> Highly parallel apps suffer from bottleneck in NativePRNG
> ----------------------------------------------------------
>
> Key: FLINK-33178
> URL: https://issues.apache.org/jira/browse/FLINK-33178
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Affects Versions: 1.13.2, 1.17.1
> Reporter: Emre Kartoglu
> Priority: Major
>
> I observed the below thread dumps that highlighted a potential bottleneck in
> Flink/Netty/JDK. The application (Flink 1.13) from which I took the thread
> dumps had very high parallelism and was distributed on nodes with >150GB
> random access memory.
> It appears that there is a call to "Arrays.copyOfRange" in a syncrhonized
> block in "sun.security.provider.NativePRNG", which blocks other threads
> waiting for the lock to the same synchronized block. This appears to be a
> problem only with highly parallel applications. I don't know exactly at what
> parallelism it starts becoming a problem, and how much of a bottleneck it
> actually is.
> I was also slightly hesitant about creating a Flink ticket as the improvement
> could well be made in Netty or even JDK. But I believe we should have a
> record of the issue in Flink Jira.
> Related: [https://bugs.openjdk.org/browse/JDK-8278371]
>
>
> {code:java}
> "Flink Netty Server (6121) Thread 43" #930 daemon prio=5 os_prio=0
> cpu=2298176.43ms elapsed=44352.31s allocated=155G defined_classes=0
> tid=0x00007f0a3397f800 nid=0x519 waiting for monitor entry
> [0x00007efc5d549000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> sun.security.provider.NativePRNG$RandomIO.implNextBytes([email protected]/NativePRNG.java:544)
> - waiting to lock <0x00007f0b62c2eee8> (a java.lang.Object)
> at
> sun.security.provider.NativePRNG.engineNextBytes([email protected]/NativePRNG.java:220)
> at
> java.security.SecureRandom.nextBytes([email protected]/SecureRandom.java:751)
> at
> sun.security.ssl.SSLCipher$T11BlockWriteCipherGenerator$BlockWriteCipher.encrypt([email protected]/SSLCipher.java:1498)
> at
> sun.security.ssl.OutputRecord.t10Encrypt([email protected]/OutputRecord.java:441)
> at
> sun.security.ssl.OutputRecord.encrypt([email protected]/OutputRecord.java:345)
> at
> sun.security.ssl.SSLEngineOutputRecord.encode([email protected]/SSLEngineOutputRecord.java:287)
> at
> sun.security.ssl.SSLEngineOutputRecord.encode([email protected]/SSLEngineOutputRecord.java:189)
> at
> sun.security.ssl.SSLEngineImpl.encode([email protected]/SSLEngineImpl.java:285)
> at
> sun.security.ssl.SSLEngineImpl.writeRecord([email protected]/SSLEngineImpl.java:231)
> at
> sun.security.ssl.SSLEngineImpl.wrap([email protected]/SSLEngineImpl.java:136)
> - eliminated <0x00007f0b6aab70c8> (a sun.security.ssl.SSLEngineImpl)
> at
> sun.security.ssl.SSLEngineImpl.wrap([email protected]/SSLEngineImpl.java:116)
> - locked <0x00007f0b6aab70c8> (a sun.security.ssl.SSLEngineImpl)
> at javax.net.ssl.SSLEngine.wrap([email protected]/SSLEngine.java:522)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:1071)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:843)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.flush(SslHandler.java:792)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:125)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:246)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.enqueueAvailableReader(PartitionRequestQueue.java:110)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:173)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:117)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:365)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:117)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:365)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1428)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:913)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.lambda$notifyReaderNonEmpty$0(PartitionRequestQueue.java:89)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue$$Lambda$1875/0x00007efc8a13b4b0.run(Unknown
> Source)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at java.lang.Thread.run([email protected]/Thread.java:829)"Flink
> Netty Server (6121) Thread 42" #929 daemon prio=5 os_prio=0 cpu=2371118.12ms
> elapsed=44352.38s allocated=162G defined_classes=0 tid=0x00007f0a3397e800
> nid=0x518 runnable [0x00007f0a1997c000]
> java.lang.Thread.State: RUNNABLE
> at java.util.Arrays.copyOfRange([email protected]/Arrays.java:4030)
> at
> sun.security.provider.NativePRNG$RandomIO.implNextBytes([email protected]/NativePRNG.java:554)
> - locked <0x00007f0b62c2eee8> (a java.lang.Object)
> at
> sun.security.provider.NativePRNG.engineNextBytes([email protected]/NativePRNG.java:220)
> at
> java.security.SecureRandom.nextBytes([email protected]/SecureRandom.java:751)
> at
> sun.security.ssl.SSLCipher$T11BlockWriteCipherGenerator$BlockWriteCipher.encrypt([email protected]/SSLCipher.java:1498)
> at
> sun.security.ssl.OutputRecord.t10Encrypt([email protected]/OutputRecord.java:441)
> at
> sun.security.ssl.OutputRecord.encrypt([email protected]/OutputRecord.java:345)
> at
> sun.security.ssl.SSLEngineOutputRecord.encode([email protected]/SSLEngineOutputRecord.java:287)
> at
> sun.security.ssl.SSLEngineOutputRecord.encode([email protected]/SSLEngineOutputRecord.java:189)
> at
> sun.security.ssl.SSLEngineImpl.encode([email protected]/SSLEngineImpl.java:285)
> at
> sun.security.ssl.SSLEngineImpl.writeRecord([email protected]/SSLEngineImpl.java:231)
> at
> sun.security.ssl.SSLEngineImpl.wrap([email protected]/SSLEngineImpl.java:136)
> - eliminated <0x00007f183702cb78> (a sun.security.ssl.SSLEngineImpl)
> at
> sun.security.ssl.SSLEngineImpl.wrap([email protected]/SSLEngineImpl.java:116)
> - locked <0x00007f183702cb78> (a sun.security.ssl.SSLEngineImpl)
> at javax.net.ssl.SSLEngine.wrap([email protected]/SSLEngine.java:522)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:1071)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:843)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.flush(SslHandler.java:792)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:125)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.writeAndFlushNextMessageIfPossible(PartitionRequestQueue.java:246)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.enqueueAvailableReader(PartitionRequestQueue.java:110)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.userEventTriggered(PartitionRequestQueue.java:173)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:117)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:365)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:117)
> at
> org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:365)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1428)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:913)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue.lambda$notifyReaderNonEmpty$0(PartitionRequestQueue.java:89)
> at
> org.apache.flink.runtime.io.network.netty.PartitionRequestQueue$$Lambda$1875/0x00007efc8a13b4b0.run(Unknown
> Source)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
> at
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
> at
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at java.lang.Thread.run([email protected]/Thread.java:829) {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)