The GitHub Actions job "License Binary Checker" on texera.git/main has failed.
Run started by GitHub user bobbai00 (triggered by bobbai00).

Head commit for run:
e27b98aa0850013697403b8c66a529e5eb67ad66 / Xinyuan Lin <[email protected]>
feat(storage): add DocumentFactory.documentExists (#5085)

### What changes were proposed in this PR?

Adds a `documentExists`-style helper to `DocumentFactory` in both the
Scala and Python code paths, so callers can check whether an
iceberg-backed document already exists at a `vfs://` URI without
catching exceptions from `openDocument` / `open_document`.

- Scala: new `DocumentFactory.documentExists(uri: URI): Boolean`.
Resolves the `VFSResourceType` to its iceberg namespace, then probes the
catalog via
`IcebergCatalogInstance.getInstance().tableExists(TableIdentifier.of(namespace,
storageKey))`. Throws `UnsupportedOperationException` for non-`vfs` URI
schemes; `IllegalArgumentException` for unsupported resource types.
- Python: new `DocumentFactory.document_exists(uri: str) -> bool`. Same
shape: probes via `catalog.table_exists(f"{namespace}.{storage_key}")`;
raises `NotImplementedError` / `ValueError` symmetrically.
- Refactor: extracted a private `resolveNamespace` (Scala) and
`_resolve_namespace` (Python) so `createDocument`, `openDocument`, and
the new helper share one resource-type → namespace mapping in each
language.
- Why `Catalog.tableExists` rather than `loadTableMetadata`:
`loadTableMetadata` catches every exception and returns `None`, so a
transient catalog error would have surfaced as a false-negative "doesn't
exist" answer. `Catalog.tableExists` only returns `false` on actual
not-found, and lets unexpected errors propagate.
- The change in `open_document` from a hard-coded `"vfs"` literal to
`VFSURIFactory.VFS_FILE_URI_SCHEME` aligns the three methods on the same
scheme constant.

### Any related issues, documentation, discussions?

Closes: #5089

### How was this PR tested?

- `sbt "WorkflowCore/Test/compile"` — clean.
- `sbt "WorkflowCore/testOnly *IcebergDocumentSpec"` — 14/14 pass,
including two new cases asserting `documentExists` returns true after
`createDocument`, false on a fresh URI, and throws
`UnsupportedOperationException` for an unsupported scheme.
- `sbt "WorkflowCore/testOnly *IcebergUtilSpec"` — 13/13 pass (refactor
did not touch `IcebergUtil`).
- `pytest amber/src/test/python/core/storage/test_document_factory.py` —
11/11 pass, including four new cases covering `document_exists`
returning true/false based on `catalog.table_exists`, raising
`ValueError` on an unsupported resource type, and raising
`NotImplementedError` on an unsupported scheme.
- `ruff check` clean on `document_factory.py` and
`test_document_factory.py`.

### Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7 in compliance with ASF.

---------

Signed-off-by: Xinyuan Lin <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Co-authored-by: Copilot Autofix powered by AI 
<[email protected]>
Co-authored-by: Meng Wang <[email protected]>

Report URL: https://github.com/apache/texera/actions/runs/25960604281

With regards,
GitHub Actions via GitBox

Reply via email to