Re: [PR] Introduce Async User Defined Functions [datafusion]

via GitHub Mon, 12 May 2025 09:15:03 -0700


paleolimbot commented on PR #14837:
URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2873182441


   Just a quick two cents that I quite like the approach in this PR. I've been 
keeping an eye on this for a use case not dissimilar to the "llm" use case, 
where we want to use an API that only exposes an `async` function and right now 
it seems like the mechanism to accomplish that involves something like a custom 
physical exec which is verbose.
   
   A second problem is how to maintain CPU usage in the presence of an 
incredibly slow scalar function. I would personally optimize the case where a 
slow scalar function is spending most of its time waiting on IO vs the case 
where a slow scalar function is doing something that happens to involve a lot 
of CPU differently, and so I quite like that this PR keeps those two cases 
separate. Optimizing slow things is sort of an infinite rabbit hole depending 
on how slow your function is and how well you can estimate that at the various 
optimizer stages, but I like the approach here (an option to cut the batch size 
just for async things, if I'm reading it correctly).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Introduce Async User Defined Functions [datafusion]

Reply via email to