[ 
https://issues.apache.org/jira/browse/FLINK-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249627#comment-16249627
 ] 

Piotr Nowojski commented on FLINK-7845:
---------------------------------------

IllegalAccessError is irrelevant to any memory leak for 99.9% and I'm 
investigating it right now.

Memory usage of your test is stable for me (please check the attached 
screenshot to the issue). The only issue that I have seen is that after lots of 
iteration I got this error:

Caused by: java.io.IOException: Insufficient number of network buffers: 
required 8, but only 4 available. The total number of network buffers is 
currently set to 11519 of 32768 bytes each. You can increase this number by 
setting the configuration keys 'taskmanager.network.memory.fraction', 
'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'.
    at 
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:257)
    at 
org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask(NetworkEnvironment.java:199)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:618)
    at java.lang.Thread.run(Thread.java:748)

but it is almost for sure caused because of ever increasing job size. Each 
subsequent iteration has more and more tasks, which is clearly visible in the 
logs.  I'm not sure if only the plan between two execute() calls is executed 
(you can easily test it), however look at the following lines in your code:

{code:java}
                        if (entitonTuples == null) {
                                entitonTuples = dsQuads;
                        } else {
                                entitonTuples = entitonTuples.union(dsQuads);
                        }
{code}

after first iteration you are always making a union with previous iterations. I 
bet this is the reason for growing job graph.

> Netty Exception when submitting batch job repeatedly
> ----------------------------------------------------
>
>                 Key: FLINK-7845
>                 URL: https://issues.apache.org/jira/browse/FLINK-7845
>             Project: Flink
>          Issue Type: Bug
>          Components: Core, Network
>    Affects Versions: 1.3.2
>            Reporter: Flavio Pompermaier
>         Attachments: Screen Shot 2017-11-13 at 14.54.38.png
>
>
> We had some problems with Flink and Netty so we wrote a small unit test to 
> reproduce the memory issues we have in production. It happens that we have to 
> restart the Flink cluster because the memory is always increasing from job to 
> job. 
> The github project is https://github.com/okkam-it/flink-memory-leak and the 
> JUnit test is contained in the MemoryLeakTest class (within src/main/test).
> I don't know if this is the root of our problems but at some point, usually 
> around the 28th loop, the job fails with the following exception (actually we 
> never faced that in production but maybe is related to the memory issue 
> somehow...):
> {code:java}
> Caused by: java.lang.IllegalAccessError: 
> org/apache/flink/runtime/io/network/netty/NettyMessage
>       at 
> io.netty.util.internal.__matchers__.org.apache.flink.runtime.io.network.netty.NettyMessageMatcher.match(NoOpTypeParameterMatcher.java)
>       at 
> io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage(SimpleChannelInboundHandler.java:95)
>       at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102)
>       ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to