FYI: The difference between `groupBy` (may trigger re-partitioning) vs. `groupByKey` (does not trigger re-partitioning) also applies to:
- `map` vs. `mapValues` - `flatMap` vs. `flatMapValues` On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy <damian....@gmail.com> wrote: > If you use stream.groupByKey() then there will be no repartitioning as long > as there have been no key changing operations preceding it, i.e, map, > selectKey, flatMap, transform. If you use stream.groupBy(...) then we see > it as a key changing operation, hence we need to repartition the data. > > On Wed, 1 Mar 2017 at 18:59 Tianji Li <skyah...@gmail.com> wrote: > > > Hi there, > > > > I wonder if it makes sense to give the option to disable auto > > repartitioning while doing groupBy. > > > > I understand with https://issues.apache.org/jira/browse/KAFKA-3561, > > an internal topic for repartition will be automatically created and > synced > > to brokers, which is useful when aggregation keys are not the ones used > > when ingesting raw data. > > > > However, in my case, I have carefully partitioned the data when ingesting > > my raw topics. If I do groupBy followed by aggregation, there will be TWO > > change logs topics, one for groupBy another or aggregation. > > > > Does it make sense to make the groupBy one configurable? > > > > Thanks > > Tianji > > > -- *Michael G. Noll* Product Manager | Confluent +1 650 453 5860 | @miguno <https://twitter.com/miguno> Follow us: Twitter <https://twitter.com/ConfluentInc> | Blog <http://www.confluent.io/blog>