Thanks for the reply! I had originally thought that this would incur a cost of spinning up a VM every time the UDF is called but thinking about it again you might be right. I guess if I make the VM accessible via a transient property on the UDF class then it would only be initialized once per executor right? Or would it be once per task?
I also was worried that this would mean you end up paying a lot in SerDe cost if you send each row over to the VM one by one? On Mon, Jun 27, 2022 at 10:02 PM Sean Owen <sro...@gmail.com> wrote: > Rather than reimplement a new UDF, why not indeed just use an embedded > interpreter? if something can turn javascript into something executable you > can wrap that in a normal Java/Scala UDF and go. > > On Mon, Jun 27, 2022 at 10:42 PM Matt Hawes <hawes.i...@gmail.com> wrote: > >> Hi all, I'm thinking about trying to implement the ability to write spark >> UDFs using javascript. >> >> For the use case I have in mind, a lot of the code is already written in >> javascript and so it would be very convenient to be able to call this >> directly from spark. >> >> I wanted to post here first before I start digging into the UDF code to >> see if anyone has attempted this already or if people have thoughts on it. >> I couldn't find anything in the Jira. I'd be especially appreciative of any >> pointers towards relevant sections of the code to get started! >> >> My rough plan is to do something similar to how python UDFs work (as I >> understand them). I.e. call out to a javascript process, potentially just >> something in GraalJs for example: https://github.com/oracle/graaljs. >> >> I understand that there's probably a long discussion to be had here with >> regards to making this part of Spark core, but I wanted to start that >> discussion. :) >> >> Best, >> Matt >> >>