Sajeev Ramakrishnan created FLINK-4902: ------------------------------------------
Summary: Flink Task Chain not getting input in a distributed manner Key: FLINK-4902 URL: https://issues.apache.org/jira/browse/FLINK-4902 Project: Flink Issue Type: Bug Components: DataSet API Affects Versions: 1.1.0 Environment: RHEL 6.6 Reporter: Sajeev Ramakrishnan Dear Team, I have the following tasks chained as a single subtask. left outer join -> filter -> map -> flatMap. The input to this would be two streams memberPlan - 22 million groupPlan - 1 million. I am running the entire job with parallelism 16. Before this task chain, I am doing two left outer joins. The problem is that one slot is getting 22 million and rest 15 slots are getting the input from groupPlan. This is making the entire execution very slow, probably 4 hours slower. Can you please throw some light on this. Regards, Sajeev -- This message was sent by Atlassian JIRA (v6.3.4#6332)