aokolnychyi commented on code in PR #4652:
URL: https://github.com/apache/iceberg/pull/4652#discussion_r889272004
##########
api/src/main/java/org/apache/iceberg/actions/DeleteOrphanFiles.java:
##########
@@ -80,6 +81,28 @@ public interface DeleteOrphanFiles extends
Action<DeleteOrphanFiles, DeleteOrpha
*/
DeleteOrphanFiles executeDeleteWith(ExecutorService executorService);
+
+ /**
+ * Pass a mode for handling the files that cannot be determined if they are
orphan
+ * @param mode mode for handling files that cannot be determined if they are
orphan
+ * @return this for method chaining
+ */
+ DeleteOrphanFiles prefixMismatchMode(String mode);
+
+ /**
+ * Pass a list of schemes to be considered equivalent when finding orphan
files
+ * @param equivalentSchemes list of equivalent schemes
+ * @return this for method chaining
+ */
+ DeleteOrphanFiles equivalentSchemes(List<String> equivalentSchemes);
+
+ /**
+ * Pass a list of authorities to be considered equivalent when finding
orphan files
Review Comment:
@RussellSpitzer, let me give an example. Suppose you have two absolutely
different buckets (`b1` and `b2`). You write new data into `b1` but
periodically clone files from `b1` into `b2`. The table contains some files in
`b1` and some files in `b2`. Now image one clone job failed but wrote a file in
`b2`.
```
s3://b2/path/to/file.parquet
```
Whenever you decide to clean `b2` against the table metadata, this file
should be considered orphan. The metadata references the original file in `b1`
as the clone job failed.
```
s3://b1/path/to/file.parquet
```
Under the current design, there is no way to clean that file. I don't have a
real use case for this so the question is whether we should worry about it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]