alamb commented on issue #22553:
URL: https://github.com/apache/datafusion/issues/22553#issuecomment-4621802768

   > Implementor (a moka-backed cache, the default, etc.) owns the singleflight 
+ loader internally. fetch_metadata in 
datafusion/datasource-parquet/src/metadata.rs switches to calling this single 
method instead of the explicit get / is_valid_for / put sequence. 
Async-fn-in-trait is already established by infer_stats_and_ordering, so no new 
ergonomic terrain on that front.
   
   Basically I think we should be aiming for:
   1. Not implement any more sophisticated caching in DataFusion itself (e.g. 
the thundering herd problem can be solved in downstream crates)
   2. Update the APIs in DataFusion to allow for that more sophisticated caching
   
   Given the current API is sync, here are two ideas:
   
   1. Switch the cache API to be `async`  (or some more explicit Future based)
   2. make DFParquetMetadata a trait / extendible so you can override the 
behavior of 
[`fetch_metadata`](https://github.com/apache/datafusion/blob/a7c2f7d3f844cd1ff76c8edb9d472d7979779153/datafusion/datasource-parquet/src/metadata.rs#L129-L128)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to