The GitHub Actions job "License Binary Checker" on texera.git/main has failed. Run started by GitHub user bobbai00 (triggered by bobbai00).
Head commit for run: e27b98aa0850013697403b8c66a529e5eb67ad66 / Xinyuan Lin <[email protected]> feat(storage): add DocumentFactory.documentExists (#5085) ### What changes were proposed in this PR? Adds a `documentExists`-style helper to `DocumentFactory` in both the Scala and Python code paths, so callers can check whether an iceberg-backed document already exists at a `vfs://` URI without catching exceptions from `openDocument` / `open_document`. - Scala: new `DocumentFactory.documentExists(uri: URI): Boolean`. Resolves the `VFSResourceType` to its iceberg namespace, then probes the catalog via `IcebergCatalogInstance.getInstance().tableExists(TableIdentifier.of(namespace, storageKey))`. Throws `UnsupportedOperationException` for non-`vfs` URI schemes; `IllegalArgumentException` for unsupported resource types. - Python: new `DocumentFactory.document_exists(uri: str) -> bool`. Same shape: probes via `catalog.table_exists(f"{namespace}.{storage_key}")`; raises `NotImplementedError` / `ValueError` symmetrically. - Refactor: extracted a private `resolveNamespace` (Scala) and `_resolve_namespace` (Python) so `createDocument`, `openDocument`, and the new helper share one resource-type → namespace mapping in each language. - Why `Catalog.tableExists` rather than `loadTableMetadata`: `loadTableMetadata` catches every exception and returns `None`, so a transient catalog error would have surfaced as a false-negative "doesn't exist" answer. `Catalog.tableExists` only returns `false` on actual not-found, and lets unexpected errors propagate. - The change in `open_document` from a hard-coded `"vfs"` literal to `VFSURIFactory.VFS_FILE_URI_SCHEME` aligns the three methods on the same scheme constant. ### Any related issues, documentation, discussions? Closes: #5089 ### How was this PR tested? - `sbt "WorkflowCore/Test/compile"` — clean. - `sbt "WorkflowCore/testOnly *IcebergDocumentSpec"` — 14/14 pass, including two new cases asserting `documentExists` returns true after `createDocument`, false on a fresh URI, and throws `UnsupportedOperationException` for an unsupported scheme. - `sbt "WorkflowCore/testOnly *IcebergUtilSpec"` — 13/13 pass (refactor did not touch `IcebergUtil`). - `pytest amber/src/test/python/core/storage/test_document_factory.py` — 11/11 pass, including four new cases covering `document_exists` returning true/false based on `catalog.table_exists`, raising `ValueError` on an unsupported resource type, and raising `NotImplementedError` on an unsupported scheme. - `ruff check` clean on `document_factory.py` and `test_document_factory.py`. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.7 in compliance with ASF. --------- Signed-off-by: Xinyuan Lin <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]> Co-authored-by: Copilot Autofix powered by AI <[email protected]> Co-authored-by: Meng Wang <[email protected]> Report URL: https://github.com/apache/texera/actions/runs/25960604281 With regards, GitHub Actions via GitBox
