aglinxinyuan opened a new issue, #5089:
URL: https://github.com/apache/texera/issues/5089
### Feature Summary
`DocumentFactory` (in both `common/workflow-core` Scala and `amber` Python)
exposes `createDocument` / `openDocument` (and the snake_case Python
equivalents), but no way to ask whether a document already exists at a given
`vfs://` URI without trying to open it.
Today, code that wants a *create-only-if-absent* flow has to call
`openDocument` inside a try/catch and inspect the failure — which conflates
"the table doesn't exist" with "the catalog rejected the request for some other
reason," and pays the cost of loading full table metadata just to answer a
boolean.
### Proposed Solution or Design
Add a new helper in both languages that performs a focused existence probe
via the iceberg catalog's native `tableExists` API:
- Scala: `DocumentFactory.documentExists(uri: URI): Boolean`
- Python: `DocumentFactory.document_exists(uri: str) -> bool`
Behavior:
- For `vfs://` URIs: resolve the `VFSResourceType` to its iceberg namespace,
then call `Catalog.tableExists(TableIdentifier.of(namespace, storageKey))`
(Scala) / `catalog.table_exists(f"{namespace}.{storage_key}")` (Python).
Unexpected catalog errors propagate rather than being swallowed.
- For unsupported schemes: throw `UnsupportedOperationException` / raise
`NotImplementedError`.
- For unsupported `VFSResourceType` values: throw `IllegalArgumentException`
/ raise `ValueError`.
While we're touching this, extract a private `resolveNamespace` /
`_resolve_namespace` helper so `createDocument`, `openDocument`, and the new
existence check share one resource-type → namespace mapping in each language,
instead of three copies that can drift.
### Affected Area
- Storage / Metadata
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]