[ https://issues.apache.org/jira/browse/KAFKA-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-6049: --------------------------------- Description: When multiple streams aggregate together to form a single larger object (e.g. a shopping website may have a cart stream, a wish list stream, and a purchases stream. Together they make up a Customer), it is very difficult to accommodate this in the Kafka-Streams DSL: it generally requires you to group and aggregate all of the streams to KTables then make multiple outer join calls to end up with a KTable with your desired object. This will create a state store for each stream and a long chain of ValueJoiners that each new record must go through to get to the final object. Creating a cogroup method where you use a single state store will: * Reduce the number of gets from state stores. With the multiple joins when a new value comes into any of the streams a chain reaction happens where the join processor keep calling ValueGetters until we have accessed all state stores. * Slight performance increase. As described above all ValueGetters are called also causing all ValueJoiners to be called forcing a recalculation of the current joined value of all other streams, impacting performance. was: When multiple streams aggregate together to form a single larger object (e.g. A shopping website may have a cart stream, a wish list stream, and a purchases stream. Together they make up a Customer), it is very difficult to accommodate this in the Kafka-Streams DSL: it generally requires you to group and aggregate all of the streams to KTables then make multiple outer join calls to end up with a KTable with your desired object. This will create a state store for each stream and a long chain of ValueJoiners that each new record must go through to get to the final object. Creating a cogroup method where you use a single state store will: * Reduce the number of gets from state stores. With the multiple joins when a new value comes into any of the streams a chain reaction happens where the join processor keep calling ValueGetters until we have accessed all state stores. * Slight performance increase. As described above all ValueGetters are called also causing all ValueJoiners to be called forcing a recalculation of the current joined value of all other streams, impacting performance. > Kafka Streams: Add Cogroup in the DSL > ------------------------------------- > > Key: KAFKA-6049 > URL: https://issues.apache.org/jira/browse/KAFKA-6049 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Guozhang Wang > Labels: api, needs-kip, user-experience > > When multiple streams aggregate together to form a single larger object (e.g. > a shopping website may have a cart stream, a wish list stream, and a > purchases stream. Together they make up a Customer), it is very difficult to > accommodate this in the Kafka-Streams DSL: it generally requires you to group > and aggregate all of the streams to KTables then make multiple outer join > calls to end up with a KTable with your desired object. This will create a > state store for each stream and a long chain of ValueJoiners that each new > record must go through to get to the final object. > Creating a cogroup method where you use a single state store will: > * Reduce the number of gets from state stores. With the multiple joins when a > new value comes into any of the streams a chain reaction happens where the > join processor keep calling ValueGetters until we have accessed all state > stores. > * Slight performance increase. As described above all ValueGetters are called > also causing all ValueJoiners to be called forcing a recalculation of the > current joined value of all other streams, impacting performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)