CTTY opened a new issue, #1877:
URL: https://github.com/apache/iceberg-rust/issues/1877
### Is your feature request related to a problem or challenge?
In the current `IcebergTableProvider`, we allow users to create a table
provider directly from a static `Table`, enabling queries without configuring a
catalog (#650).
However, this creates a subtle issue for users reading a live, changing
table: `IcebergTableProvider` will not automatically refresh the table. Users
must manually refresh the catalog to ensure they see the latest data:
```rust
// Refresh context to avoid getting a stale table
let catalog = Arc::new(IcebergCatalogProvider::try_new(client).await?);
ctx.register_catalog("catalog", catalog);
```
Supporting static tables in `IcebergTableProvider` also means the catalog
may be `None`. When the catalog is `None`, users must construct and register a
new static table **every time** they want to read the table.
This problem has become even more noticeable now that we support `INSERT
INTO` in DataFusion, allowing users to read and write Iceberg tables within the
same session:
```sql
INSERT INTO test_table VALUES ...;
SELECT * FROM test_table;
-- The inserted rows won't appear because the registered table wasn't
refreshed.
```
There is some ongoing work related to this, such as #1297, but I believe we
need a broader design discussion to address this issue once and for all. Hence
this issue.
### Describe the solution you'd like
- Option 1: Splitting the existing `IcebergTableProvider` into two
1. `IcebergTableProvider`: this provider has a `Arc<dyn Catalog>` and does
not hold a `Table` cache. Whenever it needs to get Iceberg `Table`, it calls
`catalog.load_table`
2. `IcebergStaticTableProvider`: this provider only contains `table: Table`
cache and do not need to be attached to a catalog.
This way, users can decide which table provider they need based on their use
cases. Each table provider will be solid for each use case. But this will be a
breaking change
- Option 2: Refresh when `Catalog` is available
This is basically what #1297 suggests, except in the [latest
code](https://github.com/apache/iceberg-rust/blob/main/crates/integrations/datafusion/src/table/mod.rs#L58)
we have `catalog: Option<Arc<dyn Catalog>>` rather than `catalog: Arc<dyn
Catalog>`, so we can only refresh when `Catalog` is not `None`
This option will require less changes and won't break existing use cases,
but users will need extra caution to get the wanted behavior
Would love to hear if there are other potential solutions!
### Willingness to contribute
I would be willing to contribute to this feature with guidance from the
Iceberg Rust community
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]