[ 
https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694147#comment-17694147
 ] 

Guozhang Wang commented on KAFKA-14748:
---------------------------------------

Originally I think this can be treated without a KIP, just as a fix in join 
semantics. But when I think again I realized it may not be the case, primarily 
because in that case we cannot distinguish the following two cases:

1) Key extractor returns non-null `K0`, and then found a matching record for 
`K0` with a null `V0`, resulting in `<K, join(V, null)>`.

2) Key extractor returns null `K0`, and hence we directly result in `<K, 
join(V, null)>`.

Hence, adding a `filter` operator after the `join` operator alone for `<K, 
join(V, null)>` cannot preserve the old behavior if a developer really wants 
that..

In fact, the same question applies for the general issue of 
https://issues.apache.org/jira/browse/KAFKA-12317 as well: should we try to 
distinguish between the case of extracting a null key for the join, v.s. a case 
where non-null extracted key did not found a matching record on the other 
relation (or more specifically, the other relation returns a null value with 
the extracted key).

My thoughts about the above question are as follows: put performance benefits 
aside, for app semantics where the developers knows there are certain keys in 
the other relation which would never exist (i.e. would always return a null 
value), then developer could let the key extractor to return those keys when 
they want to return no-matching join results; that means, the value of 
KAFKA-12317/KAFKA-14748 would be when the developer does not know any keys in 
the other relations that would never exist.

If we want to change to that behavior which would not distinguish these two 
cases, I'd suggest we add a flag config to enable this behavior across 
fk/out/left joins, and to remove it (i.e. always enable it) when we did not 
hear people complain about the behavior change for a while. But this would 
result in a KIP..

> Relax non-null FK left-join requirement
> ---------------------------------------
>
>                 Key: KAFKA-14748
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14748
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all 
> key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If 
> it returns `null`, it's treated as invalid. For left-joins, it might make 
> sense to still accept a `null`, and add the left-hand record with an empty 
> right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to