liuzx8888 opened a new issue, #8066: URL: https://github.com/apache/iceberg/issues/8066
### Query engine spark 3.2 ### Question When I execute the rewriteDataFiles API, the small files are merged into large files. These original small files still exist in the hdfs file. The root user is the merged large file, and the flink user is the original small file.  'SELECT * FROM "fy_zy_fymx5$files"', the valid file is the file generated by executing the rewriteDataFiles API, the previous small file should be an invalid file  but it is executed in order rewriteManifests api -->expireSnapshots api -->deleteOrphanFiles api ``` //expireSnapshots if (operationMode.equalsIgnoreCase("expireSnapshots")) { val tsToExpire = System.currentTimeMillis() SparkActions .get(spark) .expireSnapshots(table) .expireOlderThan(tsToExpire) .execute() } //deleteOrphanFiles if (operationMode.equalsIgnoreCase("deleteOrphanFiles")) { val olderThanTimestamp = System.currentTimeMillis SparkActions .get(spark) .deleteOrphanFiles(table) .olderThan(olderThanTimestamp) .prefixMismatchMode(PrefixMismatchMode.DELETE) .execute() } //rewriteManifests if (operationMode.equalsIgnoreCase("rewriteManifests")) { SparkActions .get(spark) .rewriteManifests(table) .execute() } ``` These small files still exist in hdfs and have not been deleted. How Can safely delete small files after executed rewriteDataFiles API? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
