zhuqi-lucas commented on issue #22553:
URL: https://github.com/apache/datafusion/issues/22553#issuecomment-4560615079

   Thanks for the thoughtful pushback @alamb. Looking at 
`infer_stats_and_ordering` as the comparison point was actually really useful — 
it pointed at a cleaner API shape for the cache trait. Two thoughts:
   
   **On overriding `FileFormat::infer_stats_and_ordering` itself**: that path 
is only hit during planning. At execution time, `ParquetSource::open(...) → 
ParquetFileReader::get_metadata()` goes through `fetch_metadata` directly 
without touching `infer_stats_and_ordering`. That's where the hot-pod 
thundering herd actually fires for our workload — many concurrent queries 
hitting the same parquet file in a cold-cache window — so a `FileFormat`-level 
override misses it.
   
   **On API ergonomics**: completely agree the `FnOnce + Future` callback is 
awkward. Borrowing the shape of `infer_stats_and_ordering` for the cache trait 
would be much cleaner — typed inputs in, typed output out, no closure:
   
   ```rust
   async fn fetch_metadata(
       &self,
       store: &Arc<dyn ObjectStore>,
       location: &Path,
       current_meta: &ObjectMeta,
       options: Option<&ArrowReaderOptions>,
   ) -> Result<CachedFileMetadataEntry>;
   ```
   
   Implementor (a moka-backed cache, the default, etc.) owns the singleflight + 
loader internally. `fetch_metadata` in 
`datafusion/datasource-parquet/src/metadata.rs` switches to calling this single 
method instead of the explicit `get` / `is_valid_for` / `put` sequence. 
Async-fn-in-trait is already established by `infer_stats_and_ordering`, so no 
new ergonomic terrain on that front.
   
   Happy to prototype this shape and the closure version, see which one reads 
cleaner before committing to a direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to