manishmalhotrawork opened a new pull request #1471:
URL: https://github.com/apache/iceberg/pull/1471
### **Problem:**
- Different authorities and scheme names can result into
identifying/deleting live/valid files.
- So, if file paths are like this:
```
in metadata/manifests:
hdfs:/user/head/warehouse/core_attribute_wom_live/data/site_group=PGKS/parent_wom=037/00048-13203-d1130e00-3244-4a6d-8772-a6a49f948778-00026.parquet
-----------------------------------------------------------------------------------------------------------------------------------------------------------
from file_system:
hdfs://nameservice1/user/head/warehouse/core_attribute_wom_live/data/site_group=PGKS/parent_wom=037/00048-13203-d1130e00-3244-4a6d-8772-a6a49f948778-00026.parquet
OR
myhdfs://nameservice1/user/head/warehouse/core_attribute_wom_live/data/site_group=PGKS/parent_wom=037/00048-13203-d1130e00-3244-4a6d-8772-a6a49f948778-00026.parquet
```
which means, in comparison between metadata/valid and FS will not match, and
will be identified to be deleted.
- Specially in case of HDFS this is very usual to access same HDFS using
different authority-names and also with authority as well. It all depends on
how core-site.xml and on server side these names are mapped.
### **Solution:**
- use `Path.toURI` to find `uri.getPath` (no authority and scheme) to
calculate relative path for the files.
- now compare relative file path between metadata/valid files and FS files.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]