Sajeev Ramakrishnan created FLINK-4902:
------------------------------------------

             Summary: Flink Task Chain not getting input in a distributed manner
                 Key: FLINK-4902
                 URL: https://issues.apache.org/jira/browse/FLINK-4902
             Project: Flink
          Issue Type: Bug
          Components: DataSet API
    Affects Versions: 1.1.0
         Environment: RHEL 6.6
            Reporter: Sajeev Ramakrishnan


Dear Team,

  I have the following tasks chained as a single subtask.

left outer join -> filter -> map -> flatMap.

The input to this would be two streams 
memberPlan - 22 million
groupPlan - 1 million.

I am running the entire job with parallelism 16. Before this task chain, I am 
doing two left outer joins.

The problem is that one slot is getting 22 million and rest 15 slots are 
getting the input from groupPlan.

This is making the entire execution very slow, probably 4 hours slower.

Can you please throw some light on this.

Regards,
Sajeev




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to