Hello Team, I'm writing to propose a change to the orphan file removal logic in this PR <https://github.com/apache/iceberg/pull/12278>.
Currently, the orphan file removal process lists files at the root of the table to figure out orphans files. This can lead to unintended consequences in scenarios where multiple tables share a common root directory. Example: *tbl1* -> */dir1/*tbl1 *tbl2* -> */dir1* Orphan removal of tbl2 can clean up the tbl1 directory since the listing happens at *dir1.* I propose modifying the orphan file removal logic to list specifically within the `data` and `metadata` directories of the target table. This would ensure that only files within those directories, and therefore directly associated with the table(in most cases), are considered for removal. Are there any potential drawbacks or edge cases that I haven't considered? *Note: * 1. This does not address scenarios where tables are nested within the `data` or `metadata` directories of another table. Example: *tbl1* -> dir/tbl1 *tbl2* -> dir/tbl1/data/tbl2 2. When two tables have same location Some related discussions related to location ownership here <https://github.com/apache/iceberg/issues/4159> and here <https://github.com/apache/iceberg/issues/9133> Eager to hear your feedback here or on the PR. Thank you!. - Karuppayya