rdblue commented on PR #5061:
URL: https://github.com/apache/iceberg/pull/5061#issuecomment-1160585124

   To me, it looks like the problem is that the join is producing CDC records. 
A join would normally not produce output records that don't match the filter, 
so it is strange to have `+I[2, ...]` and `-D[2, ...]` in the join output.
   
   I'm not sure how to solve this, but it looks like a problem with Flink. 
Flink shouldn't produce CDC rows for a straightforward join. It should just 
produce the regular SQL output set, which is:
   
   ```
   id | data
   ---+---------
    1 | 20220607
    3 | 20220505
   ```
   
   > We are currently using the new Sink API to fix it temporarily, FYI #4904 
(add a configuration that specifies whether to enable the new version sink), 
that use sink instead of transform, but in this way, we can't get the CDC data.
   
   Do you mean that the new sink doesn't pass the `RowKind` and you only get in 
inserted rows?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to