thinkharderdev commented on issue #11451: URL: https://github.com/apache/datafusion/issues/11451#issuecomment-2227464020
> But I still don't quite understand about `I think this is probably not a big issue if you are setting the partition parallelism to the number` mentioned above... Mind explaining it in more detail? Honestly that was just a guess on my part so it may very well be that even with one partition per core you would see the same issue. But I was thinking that with one partition per core the IO and CPU work are pipelined reasonably well. The table scan will do some IO, then decode the data and process through the rest of the pipeline. By the time the CPU work is required, there is no more IO in flight to block. With any repartitions though that would get complicated so not really sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org