kbendick commented on PR #5061:
URL: https://github.com/apache/iceberg/pull/5061#issuecomment-1163873667

   > > In Flink SQL, data is transferred from one operator to another as a 
Chanlog Stream.
   > 
   > The stream of changes from the debugger screenshot appears incorrect to 
me. Even if join outputs are modeled this way, there should not be a delete and 
an insert for ID 2. Why would an unmatched row produce anything from the join?
   
   I believe we get a +I and -D row in the changelog because a row for ID=2 
entered the join and then was not selected for the join. 
   
   As we’re selecting from `upsert_sample` as the left side of the join (aka 
t1), that has the row `('2','20220503')` in it. I am wondering if the `unique` 
keyword is adding an operator to the DAG to cause the left join to have to 
consider `('2','20220503')` during the join (and thus a temporary +I record) 
because of the enforced uniqueness constraint.
   
   So id=2 is inserted due to its existence in upsert_sample and then deleted 
because it’s not an output of the join. But it is an input of the join. Is that 
not what’s happening in the debugger output?
   
   Also @hililiwei what happens if you remove the `UNIQUE` constraint? Is that 
even possible for `id` to the the primary key? And what happens if you `select 
*` and not specify t1.id and t1.data?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to