[ https://issues.apache.org/jira/browse/FLINK-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-12294: ----------------------------------- Labels: auto-deprioritized-major performance (was: performance stale-major) > Kafka connector, work with grouping partitions > ---------------------------------------------- > > Key: FLINK-12294 > URL: https://issues.apache.org/jira/browse/FLINK-12294 > Project: Flink > Issue Type: New Feature > Components: API / DataStream, Connectors / Kafka, Runtime / Task > Reporter: Sergey > Priority: Major > Labels: auto-deprioritized-major, performance > Attachments: KeyGroupAssigner.java, KeyGroupRangeAssignment.java > > > Additional flag (with default false value) controlling whether topic > partitions already grouped by the key. Exclude unnecessary shuffle/resorting > operation when this parameter set to true. As an example, say we have > client's payment transaction in a kafka topic. We grouping by clientId > (transaction with the same clientId goes to one kafka topic partition) and > the task is to find max transaction per client in sliding windows. In terms > of map\reduce there is no needs to shuffle data between all topic consumers, > may be it`s worth to do within each consumer to gain some speedup due to > increasing number of executors within each partition data. With N messages > (in partition) instead of N*ln(N) (current realization with > shuffle/resorting) it will be just N operations. For windows with thousands > events - the tenfold gain of execution speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)