mithilgirish commented on PR #1104: URL: https://github.com/apache/iceberg-go/pull/1104#issuecomment-4506942462
All five concerns are valid and addressed in the updated patch. * **Error Propagation:** Fixed. Errors from `WalkDir`, `DeleteFiles`, and individual `Remove` calls are now collected via `errors.Join` and returned. `os.IsNotExist` is safely ignored. The declared `(err error)` return now carries actual signal. * **No Silent Fallbacks:** Fixed. The `LoadTable` fallback in SQL, Glue, and Hive that was quietly downgrading `--purge` on any non-`ErrNoSuchTable` error has been removed. The catalog now returns a wrapped error immediately — the table entry is not dropped until a purge can be attempted. * **Tracking External Paths:** Fixed. `PurgeTableFiles` now unions the base `Location()` walk with a full crawl of active metadata, manifests, and snapshots. This mirrors the PyIceberg/Java approach and catches files under custom `write.data.path` / `write.metadata.path`. Holding the push on this last point until we make a call on **Option A vs B**: * **Option A:** Promote `getReferencedFiles` in `table/orphan_cleanup.go` to a public method on `Table`. No duplication, logic is already correct, but the diff touches `table/`. * **Option B:** Private helper in `catalog/internal/utils.go` via public metadata APIs. PR stays isolated to `catalog/`, but introduces duplication that needs to stay in sync. * **Test Coverage:** `TestPurgeTable` now writes real Parquet + a mock `stats.puffin` outside the table root, asserting both are fully wiped. Glue and Hive mock purge tests removed — no filesystem backend meant they weren't covering `PurgeTableFiles` at all. * **PurgeableTable GoDoc:** Documents the best-effort model, type-assertion pattern, the un-loadable-on-failure warning, and the client-side vs REST (`purgeRequested=true`) distinction. Let me know your preference on Option A or B, and I will push the updated patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
