adriangb commented on issue #8227:
URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2719197476

   I haven't been able to keep much track of the (amazing, far reaching) work 
on `StatisticsV2` but I have a couple of questions from my limited 
understanding:
   - DataFusion will be able to populate stats from Parquet files by reading 
the Parquet metadata, right? Will it do so lazily or would it do it eagerly in 
one go?
   - Can I give DataFusion pre-computed statistics? In particular I have some 
of the stats in a secondary index. Could I pull some of those stats and feed 
them into DataFusion to avoid it having to fetch parquet metadata to e.g. 
decide it can skip processing a file altogether? If I did feed in partial 
stats, would DataFusion be able to fetch the remainder e.g. for other columns 
or for files I didn't provide stats for?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to