[ 
https://issues.apache.org/jira/browse/FLINK-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sajeev Ramakrishnan updated FLINK-4902:
---------------------------------------
    Description: 
Dear Team,

  I have the following tasks chained as a single subtask.

left outer join -> filter -> map -> flatMap.

The input to this would be two streams 
memberPlan - 22 million
groupPlan - 1 million.

I am running the entire job with parallelism 16. Before this task chain, I am 
doing two left outer joins.

The problem is that one slot is getting 22 plus million (includes some from 
groupPlan) and rest 15 slots are getting the input from groupPlan.

This is making the entire execution very slow, probably 4 hours slower.

Can you please throw some light on this.

Regards,
Sajeev


  was:
Dear Team,

  I have the following tasks chained as a single subtask.

left outer join -> filter -> map -> flatMap.

The input to this would be two streams 
memberPlan - 22 million
groupPlan - 1 million.

I am running the entire job with parallelism 16. Before this task chain, I am 
doing two left outer joins.

The problem is that one slot is getting 22 million and rest 15 slots are 
getting the input from groupPlan.

This is making the entire execution very slow, probably 4 hours slower.

Can you please throw some light on this.

Regards,
Sajeev



> Flink Task Chain not getting input in a distributed manner
> ----------------------------------------------------------
>
>                 Key: FLINK-4902
>                 URL: https://issues.apache.org/jira/browse/FLINK-4902
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API
>    Affects Versions: 1.1.0
>         Environment: RHEL 6.6
>            Reporter: Sajeev Ramakrishnan
>
> Dear Team,
>   I have the following tasks chained as a single subtask.
> left outer join -> filter -> map -> flatMap.
> The input to this would be two streams 
> memberPlan - 22 million
> groupPlan - 1 million.
> I am running the entire job with parallelism 16. Before this task chain, I am 
> doing two left outer joins.
> The problem is that one slot is getting 22 plus million (includes some from 
> groupPlan) and rest 15 slots are getting the input from groupPlan.
> This is making the entire execution very slow, probably 4 hours slower.
> Can you please throw some light on this.
> Regards,
> Sajeev



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to