alamb commented on issue #22553: URL: https://github.com/apache/datafusion/issues/22553#issuecomment-4621802768
> Implementor (a moka-backed cache, the default, etc.) owns the singleflight + loader internally. fetch_metadata in datafusion/datasource-parquet/src/metadata.rs switches to calling this single method instead of the explicit get / is_valid_for / put sequence. Async-fn-in-trait is already established by infer_stats_and_ordering, so no new ergonomic terrain on that front. Basically I think we should be aiming for: 1. Not implement any more sophisticated caching in DataFusion itself (e.g. the thundering herd problem can be solved in downstream crates) 2. Update the APIs in DataFusion to allow for that more sophisticated caching Given the current API is sync, here are two ideas: 1. Switch the cache API to be `async` (or some more explicit Future based) 2. make DFParquetMetadata a trait / extendible so you can override the behavior of [`fetch_metadata`](https://github.com/apache/datafusion/blob/a7c2f7d3f844cd1ff76c8edb9d472d7979779153/datafusion/datasource-parquet/src/metadata.rs#L129-L128) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
