[jira] [Updated] (FLINK-8526) When use parallelism equals to half of the taskslot, join and shuffle operators will easly cause deadlock.

zhu.qing (JIRA) Tue, 30 Jan 2018 18:50:44 -0800

     [ 
https://issues.apache.org/jira/browse/FLINK-8526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zhu.qing updated FLINK-8526:
----------------------------
    Description: The next program attached will stuck at some special 
parallelism in some situation. When parallelism is 80 in previous setting, The 
program will always stuck. And when parallelism is 100, everything goes well.  
According to my research I found when the parallelism equals to number of 
taskslots. The program is not fastest and probably caused network buffer not 
enough. How networker buffer related to parallelism and  how parallelism relate 
to running task (In other words we have 160 taskslots but running task can be 
far more than taskslots).   (was: The next program attached will stuck at some 
special parallelism in some situation. When parallelism is 80 in previous 
setting, The program will always stuck. And when parallelism is 100, everything 
goes well.  According to my research I found when the parallelism equals to 
number of taskslots. The program is not fastest and probably caused network 
buffer not enough. How networker buffer related to parallelism and  how 
parallelism relate to running task (In other words we have 160 taskslots but 
running task can be far more than taskslots).)

> When use parallelism equals to half of the taskslot, join and shuffle 
> operators will easly cause deadlock.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-8526
>                 URL: https://issues.apache.org/jira/browse/FLINK-8526
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management, Java API, Local Runtime
>    Affects Versions: 1.4.0
>         Environment: 8 machines(96GB and 24 cores)  and 20 taskslot per 
> taskmanager. twitter-2010 dataset. And parallelism setting to 80. I run my 
> code in standalone mode. 
>            Reporter: zhu.qing
>            Priority: Major
>         Attachments: T2AdjActiveV.java, T2AdjMessage.java
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The next program attached will stuck at some special parallelism in some 
> situation. When parallelism is 80 in previous setting, The program will 
> always stuck. And when parallelism is 100, everything goes well.  According 
> to my research I found when the parallelism equals to number of taskslots. 
> The program is not fastest and probably caused network buffer not enough. How 
> networker buffer related to parallelism and  how parallelism relate to 
> running task (In other words we have 160 taskslots but running task can be 
> far more than taskslots). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8526) When use parallelism equals to half of the taskslot, join and shuffle operators will easly cause deadlock.

Reply via email to