bbasosuho opened a new issue, #4639:
URL: https://github.com/apache/iceberg/issues/4639

   A problem occurs when updating/deleting with spark-sql after loading data 
with iceberg Java api
   
   **Iceberg Version : iceberg-spark-runtime-3.1_2.12-0.13.1.jar
   Spark Version : 3.1.2**
   
   
   First, load 3 rows of data with iceberg Java api
   
![image](https://user-images.githubusercontent.com/20290980/165416371-f17c0f97-6529-4523-8452-25cdcf5ddd36.png)
   
   And load 1 rows of data with iceberg Java api ( id = 4 ) 
   
![image](https://user-images.githubusercontent.com/20290980/165416365-3ae32cb1-417b-434e-8b5c-336dc9458dcf.png)
   
   
   Delete 1 row data with iceberg Java api ( id = 2 )
   
![image](https://user-images.githubusercontent.com/20290980/165416210-cc7ffec4-9f41-4e06-ad87-17a59bef3dad.png)
   
   And Retrieve data from spark-sql
   Rows with IDs 1, 3, 4 are retrieved. (  Normal ) 
   
   Update through spark-sql ( id =1 )
   (delete also happens the same.)
   
   case1 ) 
   UPDATE iceberg.testdb.testtb SET memo='new-value' WHERE id = **2** ; 
   
   And Retrieve data from spark-sql
   Rows with IDs 1, 3, 4 are retrieved. (  Normal ) 
   
   case2 ) 
   UPDATE iceberg.testdb.testtb SET memo='new' WHERE id = **'2'** ; 
   
   And Retrieve data from spark-sql
   **Rows with IDs 1, 2, 3, 4 are retrieved. (  Abnormal )** 
   
![image](https://user-images.githubusercontent.com/20290980/165416770-58d5ce73-0e8b-431e-a3d9-4d3ac9315811.png)
   
   
   **Is there any problem when using the iceberg API and spark-sql crossover?
   It seems that the existing delete file information is lost when performing 
Update/Delete statements in Spark-sql.**
   
   
   Iceberg API is used as below.
   (Writer is implemented by inheriting BaseEqualityDeltaWriter.)
   
   ...
   WriteResult files = writer.complete();
   RowDelta newRowDelta = icebergTable.newRowDelta();
   Arrays.stream(files.dataFiles()).forEach(newRowDelta::addRows);
   Arrays.stream(files.deleteFiles()).forEach(newRowDelta::addDeletes);
   ....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to