[GitHub] [iceberg] liuzx8888 opened a new issue, #8066: How Can safely delete small files after executed rewriteDataFiles

via GitHub Fri, 14 Jul 2023 03:34:35 -0700


liuzx8888 opened a new issue, #8066:
URL: https://github.com/apache/iceberg/issues/8066


   ### Query engine
   
   spark 3.2
   
   ### Question
   
   When I execute the rewriteDataFiles API, the small files are merged into 
large files. These original small files still exist in the hdfs file. The root 
user is the merged large file, and the flink user is the original small file.
   
   
![image](https://github.com/apache/iceberg/assets/10862577/ea88917c-d207-4f10-8dcc-f5843ccad063)
   
   'SELECT * FROM "fy_zy_fymx5$files"', the valid file is the file generated by 
executing the rewriteDataFiles API, the previous small file should be an 
invalid file
   
![image](https://github.com/apache/iceberg/assets/10862577/a3fa3ca2-7a9b-4c90-98cd-51226809cf02)
   
   but it is executed in order rewriteManifests api -->expireSnapshots api 
-->deleteOrphanFiles api
   
   ```
       //expireSnapshots
       if (operationMode.equalsIgnoreCase("expireSnapshots")) {
         val tsToExpire = System.currentTimeMillis()  
         SparkActions
           .get(spark)
           .expireSnapshots(table)
           .expireOlderThan(tsToExpire)
           .execute()
       }
   
       //deleteOrphanFiles
       if (operationMode.equalsIgnoreCase("deleteOrphanFiles")) {
         val olderThanTimestamp = System.currentTimeMillis
         SparkActions
           .get(spark)
           .deleteOrphanFiles(table)
           .olderThan(olderThanTimestamp)
           .prefixMismatchMode(PrefixMismatchMode.DELETE)
           .execute()
       }
   
       //rewriteManifests
       if (operationMode.equalsIgnoreCase("rewriteManifests")) {
         SparkActions
           .get(spark)
           .rewriteManifests(table)
           .execute()
       }
   ```
   These small files still exist in hdfs and have not been deleted. How Can 
safely delete small files after executed  rewriteDataFiles API？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] liuzx8888 opened a new issue, #8066: How Can safely delete small files after executed rewriteDataFiles

Reply via email to