korowa commented on PR #11627: URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2254539608
> 1000 partitions @alamb this is also a bit unexpected, since default value of rows to fire check after is 100_000 and its applied per partition (each partition is going to process at least 100k rows normally, without skipping aggregation), and the total number of rows in the file ~100kk (if I'm not mistaken). So this optimization should not benefit in this case, as in case of 1000 partitions each partition will read ~100_000 rows anyway :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org