ayush-san opened a new issue #2610:
URL: https://github.com/apache/iceberg/issues/2610


   I am using flink CDC to stream CDC changes in an iceberg table. For some 
rows, I am getting duplicate rows in the table even though while writing I have 
passed equalityFieldColumns
   
   ```
   FlinkSink.forRowData(rowDataDataStream)
                   .table(icebergTable)
                   
.tableSchema(FlinkSchemaUtil.toSchema(FlinkSchemaUtil.convert(icebergTable.schema())))
                   .tableLoader(tableLoader)
                   .equalityFieldColumns(tableConfig.getEqualityColumns())
                   .build();
   ```
   I have verified at both debezium and flink end that they are not producing 
duplicate rows. PFA the flink datastream output 
   
   ```
   +I(1616881,1293386,invoice,XXXXXXXX.....)
   -U(1616881,1293386,invoice,XXXXXXXX.....)
   +U(1616881,1293386,invoice,XXXXXXXX.....)
   -U(1616881,1293386,invoice,XXXXXXXX.....)
   +U(1616881,1293386,invoice,XXXXXXXX.....)
   -U(1616881,1293386,invoice,XXXXXXXX.....)
   +U(1616881,1293386,invoice,XXXXXXXX.....)
   -U(1616881,1293386,invoice,XXXXXXXX.....)
   +U(1616881,1293386,invoice,XXXXXXXX.....)
   -U(1616881,1293386,invoice,XXXXXXXX.....)
   +U(1616881,1293386,invoice,XXXXXXXX.....)
   ```
   
   Here's the query result for same ID in spark-sql
   
![image](https://user-images.githubusercontent.com/57655135/118756206-0db59880-b888-11eb-9006-5e64b25d5149.png)
   
   I am facing this issue in most of my tables and only for some rows. What can 
be the reason behind this? Will it be solved by 
https://github.com/apache/iceberg/pull/2410? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to