ConeyLiu opened a new pull request #2890:
URL: https://github.com/apache/iceberg/pull/2890
`RemoveOrphanFiles` use `actualFileDF leftanti join validFileDF` to
determine which files should be removed. We will list all the files under the
provided or table location directory with `FileSystem.listStatus` and create
the `actualFileDF`. `validFileDF` is created by index those metadata file and
reference.
However, `FileSystem.listStatus` will `qualify` the given path. For example:
a path: `hdfs:/path` will be qualified with `hdfs://host:port/path`. If the
`warehouse` is set as: `hdfs:/path`:
`validFileDF`:
hdfs:/path/file1
hdfs:/path/file2
hdfs:/path/file3
....
`actualFileDF`:
hdfs://host:port/path/file1
hdfs://host:port/path/file2
hdfs://host:port/path/file3
....
Then, all the files in `actualFileDF` will be treated as invalid.
In this patch, we only compare the pure path (remove the schema and
authority) when doing the `leftanti join`.
Updated existed UTs to test it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]