[ 
https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127426#comment-17127426
 ] 

josson paul kalapparambath commented on FLINK-17560:
----------------------------------------------------

[~xintongsong] [~chesnay]

We are using customized Flink. I do have modification in the JobManager 
scheduler code. I fixed the ConcurrentModificationException which was happening 
in the Job Manager code. But the original issue still happens. 

If you see my above message (also I have attached the thread dump), you can see 
that Tasks are getting stuck for ever in the JVM and the 'finally' method for 
those 'Tasks' are never called. Because of this, the slots will never go into 
'FREE' state. 

Anybody reported this issue before. This happens only if number of threads in 
Task manger is around 950. 

 

> No Slots available exception in Apache Flink Job Manager while Scheduling
> -------------------------------------------------------------------------
>
>                 Key: FLINK-17560
>                 URL: https://issues.apache.org/jira/browse/FLINK-17560
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.8.3
>         Environment: Flink verson 1.8.3
> Session cluster
>            Reporter: josson paul kalapparambath
>            Priority: Major
>         Attachments: jobmgr.log, threaddump-tm.txt, tm.log
>
>
> Set up
> ------
> Flink verson 1.8.3
> Zookeeper HA cluster
> 1 ResourceManager/Dispatcher (Same Node)
> 1 TaskManager
> 4 pipelines running with various parallelism's
> Issue
> ------
> Occationally when the Job Manager gets restarted we noticed that all the 
> pipelines are not getting scheduled. The error that is reporeted by the Job 
> Manger is 'not enough slots are available'. This should not be the case 
> because task manager was deployed with sufficient slots for the number of 
> pipelines/parallelism we have.
> We further noticed that the slot report sent by the taskmanger contains solts 
> filled with old CANCELLED job Ids. I am not sure why the task manager still 
> holds the details of the old jobs. Thread dump on the task manager confirms 
> that old pipelines are not running.
> I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is 
> not the issue happening in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to