MirrerZu commented on issue #6831:
URL: https://github.com/apache/seatunnel/issues/6831#issuecomment-2114698027

   let's focus only the insert behavior of sink and paimon.
   a PK table using deduplicate merge engine, like: `create table test(id int, 
tchar string, primary key(id) not enforced);` , and some data in the table: 
   | id | tchar |
   |---|---|
   | 1 | abc |
   | 2 | abc |
   | 3 | abc |
   
   if insert some data where the primary keys already exist:
   | id | tchar |
   |---|---|
   | 1 | ccc|
   | 2 | ccc|
   
   query this table (default lastest snapshot) should like (the result of using 
flink/spark to insert):
   | id | tchar |
   |---|---|
   | 1 | ccc|
   | 2 | ccc|
   | 3 | abc |
   
   when using seatunnel's sink to insert, it looks like paimon has not 
correctly merged all of the data: 
   | id | tchar |
   |---|---|
   | 1 | abc |
   | 2 | ccc|
   | 3 | abc |
   
   and I try to query with time travel, every paimon's snapshot isn't 
correct,just like lost some data.
   that's why I think something may be wrong with the paimon sink, OR can't use 
paimon sink in batch mode to insert data have same PK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to