[GitHub] [iceberg] xianyouQ opened a new issue, #7522: delete files with spark sql failed

via GitHub Wed, 03 May 2023 19:43:30 -0700


xianyouQ opened a new issue, #7522:
URL: https://github.com/apache/iceberg/issues/7522


   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   we have a very large day partition table written by a flink job using hive 
catalog,  the daily incremental data is close to 15 TB,  40000+ files.  we  
deleted the older data with spark sql  like delete from table where dt = 
'20230415'  and result is successful(using spark count sql and it says 0 rows). 
but after a few minutes we found the deleted data roll back and be queryable 
again throught spark sql.
   we checked the manifest files and found two snapshots had the same sequence 
number, one was generated by spark  job  which the delete operation run, 
another is generated by the flink job, and the following  manifest files 
generated basing on the second one. it seems that the hive lock didn't work.
   any idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] xianyouQ opened a new issue, #7522: delete files with spark sql failed

Reply via email to