adriangb commented on issue #8227: URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2719197476
I haven't been able to keep much track of the (amazing, far reaching) work on `StatisticsV2` but I have a couple of questions from my limited understanding: - DataFusion will be able to populate stats from Parquet files by reading the Parquet metadata, right? Will it do so lazily or would it do it eagerly in one go? - Can I give DataFusion pre-computed statistics? In particular I have some of the stats in a secondary index. Could I pull some of those stats and feed them into DataFusion to avoid it having to fetch parquet metadata to e.g. decide it can skip processing a file altogether? If I did feed in partial stats, would DataFusion be able to fetch the remainder e.g. for other columns or for files I didn't provide stats for? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org