Re: UDF Lifecycle

Till Westmann Sun, 17 Nov 2019 11:29:23 -0800

It seems that it's be nice if we had a step (similar to theinitialization step) in the deployment lifecycle as well.And I guess that we'd need to corresponding clean-up step forun-deployment as well.


Does that make sense? If so, should we file an improvement for this?


Cheers,
Till

On 17 Nov 2019, at 9:29, Xikui Wang wrote:

The UDF interface has an initialize method which is invoked per every
lifecycle. Putting the model loading code in there can probably solveyourproblem. The initialization is done per query (Hyrack job). Forexample, if
you do

SELECT mylib#myudf(t) FROM Tweets t;
in which there are 100 tweets in the Tweets dataset. Theinitialization
method will be called once and the evaluate method will be invoked 100
times. In the context of feeds attached with UDFs, the
initialization happens only once when feed starts.

Best,
Xikui

On Sun, Nov 17, 2019 at 6:44 AM Torsten Bergh Moss <
[email protected]> wrote:
Dear developers,
I am trying to build a machine learning-based UDF for classification.This
involves loading in a model that has been trained offline, which in
practice basically is deserialization of a big object. This processofdeserialization takes a significant amount of time, but it only"needs" tohappen once, and after that the model can do the classificationrather
rapidly.
Therefore, in order to avoid having to load the model every time theUDFis called, I am wondering where in the UDF lifecycle I can do theloadingin order to achieve a "load model once, classifyinfinitely"-scenario, andhow to implement it. I am assuming it should be done somewhere insidethefactory-function-relationship, but I am not sure where/how and can'tseem
to find a lot of documentation on it.


All help is appreciated, thanks!


Best wishes,

Torsten

Re: UDF Lifecycle

Reply via email to