[jira] [Commented] (KAFKA-4835) Avoid repartitioning when key change doesn't change partitions

Levani Kokhreidze (Jira) Mon, 02 Mar 2020 14:32:56 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049715#comment-17049715
 ]


Levani Kokhreidze commented on KAFKA-4835:
------------------------------------------

Hey [~vvcephei],

 

Indeed this ticket is something that shouldn't be connected to KIP-221. I'll 
unlink it from the KIP and stop the progress on it.

I don't plan to add anything new to existing PR 
([https://github.com/apache/kafka/pull/7170]) as it's already quite big.

Thanks for noticing this.

 

Regards,

Levani

> Avoid repartitioning when key change doesn't change partitions
> --------------------------------------------------------------
>
>                 Key: KAFKA-4835
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4835
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>    Affects Versions: 0.10.2.0
>            Reporter: Michal Borowiecki
>            Priority: Major
>              Labels: kip
>             Fix For: 2.6.0
>
>
> From 
> https://issues.apache.org/jira/browse/KAFKA-4601?focusedCommentId=15881030&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15881030
> ...it would be good to provide users more control over the repartitioning. 
>  My use case is as follows (unrelated bits omitted for brevity):
> {code:java}
>               KTable<String, Activity> loggedInCustomers = builder
>                       .stream("customerLogins")
>                       .groupBy((key, activity) -> 
>                               activity.getCustomerRef())
>                       .reduce((first,second) -> second, loginStore());
>               
>               builder
>                       .stream("balanceUpdates")
>                       .map((key, activity) -> new KeyValue<>(
>                               activity.getCustomerRef(),
>                               activity))
>                       .join(loggedInCustomers, (activity, session) -> ...
>                       .to("sessions");
> {code}
> Both "groupBy" and "map" in the underlying implementation set the 
> repartitionRequired flag (since the key changes), and the aggregation/join 
> that follows will create the repartitioned topic.
>  However, in our case I know that both input streams are already partitioned 
> by the customerRef value, which I'm mapping into the key (because it's 
> required by the join operation).
>  So there are 2 unnecessary intermediate topics created with their associated 
> overhead, while the ultimate goal is simply to do a join on a value that we 
> already use to partition the original streams anyway.
>  (Note, we don't have the option to re-implement the original input streams 
> to make customerRef the message key.)
> I think it would be better to allow the user to decide (from their knowledge 
> of the incoming streams) whether a repartition is mandatory on aggregation 
> and join operations (overloaded version of the methods with the 
> repartitionRequired flag exposed maybe?)
>  An alternative would be to allow users to perform a join on a value other 
> than the key (a keyValueMapper parameter to join, like the one used for joins 
> with global tables), but I expect that to be more involved and error-prone to 
> use for people who don't understand the partitioning requirements well 
> (whereas it's safe for global tables).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-4835) Avoid repartitioning when key change doesn't change partitions

Reply via email to