paleolimbot commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2873182441
Just a quick two cents that I quite like the approach in this PR. I've been keeping an eye on this for a use case not dissimilar to the "llm" use case, where we want to use an API that only exposes an `async` function and right now it seems like the mechanism to accomplish that involves something like a custom physical exec which is verbose. A second problem is how to maintain CPU usage in the presence of an incredibly slow scalar function. I would personally optimize the case where a slow scalar function is spending most of its time waiting on IO vs the case where a slow scalar function is doing something that happens to involve a lot of CPU differently, and so I quite like that this PR keeps those two cases separate. Optimizing slow things is sort of an infinite rabbit hole depending on how slow your function is and how well you can estimate that at the various optimizer stages, but I like the approach here (an option to cut the batch size just for async things, if I'm reading it correctly). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org