The UDF interface has an initialize method which is invoked per every
lifecycle. Putting the model loading code in there can probably solve your
problem. The initialization is done per query (Hyrack job). For example, if
you do

SELECT mylib#myudf(t) FROM Tweets t;

in which there are 100 tweets in the Tweets dataset. The initialization
method will be called once and the evaluate method will be invoked 100
times. In the context of feeds attached with UDFs, the
initialization happens only once when feed starts.

Best,
Xikui

On Sun, Nov 17, 2019 at 6:44 AM Torsten Bergh Moss <
[email protected]> wrote:

> Dear developers,
>
>
> I am trying to build a machine learning-based UDF for classification. This
> involves loading in a model that has been trained offline, which in
> practice basically is deserialization of a big object. This process of
> deserialization takes a significant amount of time, but it only "needs" to
> happen once, and after that the model can do the classification rather
> rapidly.
>
>
> Therefore, in order to avoid having to load the model every time the UDF
> is called, I am wondering where in the UDF lifecycle I can do the loading
> in order to achieve a "load model once, classify infinitely"-scenario, and
> how to implement it. I am assuming it should be done somewhere inside the
> factory-function-relationship, but I am not sure where/how and can't seem
> to find a lot of documentation on it.
>
>
> All help is appreciated, thanks!
>
>
> Best wishes,
>
> Torsten
>

Reply via email to