nastra commented on PR #13245: URL: https://github.com/apache/iceberg/pull/13245#issuecomment-3030955118
> Thanks @nastra on a second pass one thing I realized is we should probably double check the metadata only delete path for Spark to make sure we're cleaning up orphan DVs in that case too (where the predicates of the delete operation can completely be applied to metadata); in that case we're using the [`DeleteFiles` API ](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L379)which hasn't been updated to remove the orphan DVs. In that case we'd be not cleaning up the orphaned delete files unless I'm missing something. The current changes look good in general though, just think there may be an additional case we need to address to make sure spec compliant metadata is produced. > > cc @aokolnychyi @stevenzwu Yes I'm aware of that and we won't be able to solve this with this PR. This is going to be addressed in https://github.com/apache/iceberg/pull/13222, which modifies the `MergingSnapshotProducer` and passes the files to be deleted to the delete manifest filter manager, which in turn will then detect orphaned DVs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
