Torsten,

I see a couple of possible approaches here:

1. Make your function operate on arrays of values instead of primitive values.
You'll probably need to have a GROUP BY in your query to create an array (using 
ARRAY_AGG() or GROUP AS variable).
Then pass that array to your function which would process it and would also 
return a result array.
Then unnest that output  array to get the cardinality back.

2. Alternatively,  you could try creating a new runtime for ASSIGN operator 
that'd pass batches of input tuples to a new kind of function evaluator.
You'll need to provide replacements for AssignPOperator/AssignRuntimeFactory.
Also you'd need to modify InlineVariablesRule[1] so it doesn't inline those 
ASSIGNS.

[1] 
https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/InlineVariablesRule.java#L144

Thanks,
-- Dmitry
 

On 2/27/20, 2:02 PM, "Torsten Bergh Moss" <[email protected]> wrote:

    Greetings everyone,
    
    
    I'm experimenting a lot with UDF's utilizing Neural Network inference, 
mainly for classification of tweets. Problem is, running the UDF's in a 
one-at-a-time fashion severely under-exploits the capacity of GPU-powered NN's, 
as well as there being a certain latency associated with moving data from the 
CPU to the GPU and back every time the UDF is called, causing for poor 
performance.
    
    
    Ideally it would be possible use the UDF to process records in a 
micro-batch fashion, letting them accumulate until a certain batch-size is 
reached (as big as my GPU's memory can handle) before passing the data along to 
the neural network to get the outputs.
    
    
    Is there a way to accomplish this with the current UDF framework (either in 
java or python)? If not, where would I have to start to develop such a feature?
    
    
    Best wishes,
    
    Torsten Bergh Moss
    

Reply via email to