CTTY opened a new issue, #1877:
URL: https://github.com/apache/iceberg-rust/issues/1877

   ### Is your feature request related to a problem or challenge?
   
   In the current `IcebergTableProvider`, we allow users to create a table 
provider directly from a static `Table`, enabling queries without configuring a 
catalog (#650).
   
   However, this creates a subtle issue for users reading a live, changing 
table: `IcebergTableProvider` will not automatically refresh the table. Users 
must manually refresh the catalog to ensure they see the latest data:
   
   ```rust
   // Refresh context to avoid getting a stale table
   let catalog = Arc::new(IcebergCatalogProvider::try_new(client).await?);
   ctx.register_catalog("catalog", catalog);
   ```
   
   Supporting static tables in `IcebergTableProvider` also means the catalog 
may be `None`. When the catalog is `None`, users must construct and register a 
new static table **every time** they want to read the table.
   
   This problem has become even more noticeable now that we support `INSERT 
INTO` in DataFusion, allowing users to read and write Iceberg tables within the 
same session:
   
   ```sql
   INSERT INTO test_table VALUES ...;
   SELECT * FROM test_table; 
   -- The inserted rows won't appear because the registered table wasn't 
refreshed.
   ```
   
   There is some ongoing work related to this, such as #1297, but I believe we 
need a broader design discussion to address this issue once and for all. Hence 
this issue.
   
   
   ### Describe the solution you'd like
   
   - Option 1: Splitting the existing `IcebergTableProvider` into two
   1. `IcebergTableProvider`: this provider has a `Arc<dyn Catalog>` and does 
not hold a `Table` cache. Whenever it needs to get Iceberg `Table`, it calls 
`catalog.load_table`
   2. `IcebergStaticTableProvider`: this provider only contains `table: Table` 
cache and do not need to be attached to a catalog. 
   This way, users can decide which table provider they need based on their use 
cases. Each table provider will be solid for each use case. But this will be a 
breaking change
   
   - Option 2: Refresh when `Catalog` is available
   This is basically what #1297 suggests, except in the [latest 
code](https://github.com/apache/iceberg-rust/blob/main/crates/integrations/datafusion/src/table/mod.rs#L58)
 we have `catalog: Option<Arc<dyn Catalog>>` rather than `catalog: Arc<dyn 
Catalog>`, so we can only refresh when `Catalog` is not `None`
   This option will require less changes and won't break existing use cases, 
but users will need extra caution to get the wanted behavior
   
   Would love to hear if there are other potential solutions!
   
   ### Willingness to contribute
   
   I would be willing to contribute to this feature with guidance from the 
Iceberg Rust community


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to