[jira] [Commented] (KAFKA-3705) Support non-key joining in KTable

Adam Bellemare (JIRA) Sun, 27 May 2018 17:56:28 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492200#comment-16492200
 ]


Adam Bellemare commented on KAFKA-3705:
---------------------------------------

I too am interested, as my org, like many others, have been liberating data out 
of relational databases into streams, and finding the lack of foreign key joins 
painful. To me, this is an impediment for adopting Kafka. Many companies may 
try this out, note that it's not terrible easy to use, and then settle into a 
pattern of simply using Kafka as a means to transfer data between relational 
databases. Organizationally, this sort of Jira has huge impacts on how teams 
can organize and what can be done. Without it, it means that relational data 
liberated out into the event driven world will continue to provide the same 
relational pains to all who use it.

There are some expensive ways to mitigate this effect, but it would simply just 
be much better to have data transformation patterns that match the data 
patterns produced by tools like Kafka Connect and Debezium.

The last follow up I could find to this was in the mailing list, Feb 16, 2018:

[http://mail-archives.apache.org/mod_mbox/kafka-dev/201802.mbox/%[email protected]%3e]

Jan lists four points that they need help with, but it doesn't seem that any of 
the regular contributors following the PR replied. I suspect that it got lost 
in the mix. It seems that a lot of work was also done, and that it is in the 
right direction. I hope that someone familiar with the issue can help kick 
start it again, as for now all I can really do myself is affirm that this would 
be an extremely beneficial feature. Thanks for all the work that everyone has 
done so far, I know that it can often go underappreciated.

> Support non-key joining in KTable
> ---------------------------------
>
>                 Key: KAFKA-3705
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3705
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: api
>
> Today in Kafka Streams DSL, KTable joins are only based on keys. If users 
> want to join a KTable A by key {{a}} with another KTable B by key {{b}} but 
> with a "foreign key" {{a}}, and assuming they are read from two topics which 
> are partitioned on {{a}} and {{b}} respectively, they need to do the 
> following pattern:
> {code}
> tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' 
> is partitioned on "a"
> tableA.join(tableB', joiner);
> {code}
> Even if these two tables are read from two topics which are already 
> partitioned on {{a}}, users still need to do the pre-aggregation in order to 
> make the two joining streams to be on the same key. This is a draw-back from 
> programability and we should fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-3705) Support non-key joining in KTable

Reply via email to