alamb commented on PR #17275: URL: https://github.com/apache/datafusion/pull/17275#issuecomment-3271688853
> I found a potential performance regression with `parquet 56.1.0`. Now more data pages will be returned if their size is less than the execution batch size. For example: Thanks @nuno-faria -- this is a great find. @XiangpengHao and I purposely added a setting that allows disabling the cache for precisely this reason So what I think is needed is here is a way to turn this setting off via a DataFusion setting as well, which is what I was trying to say with > . Add new Parquet option to control the size of the predicate cache Let me give this a try and see if we can get it working better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org