Torsten, I see a couple of possible approaches here:
1. Make your function operate on arrays of values instead of primitive values. You'll probably need to have a GROUP BY in your query to create an array (using ARRAY_AGG() or GROUP AS variable). Then pass that array to your function which would process it and would also return a result array. Then unnest that output array to get the cardinality back. 2. Alternatively, you could try creating a new runtime for ASSIGN operator that'd pass batches of input tuples to a new kind of function evaluator. You'll need to provide replacements for AssignPOperator/AssignRuntimeFactory. Also you'd need to modify InlineVariablesRule[1] so it doesn't inline those ASSIGNS. [1] https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules/InlineVariablesRule.java#L144 Thanks, -- Dmitry On 2/27/20, 2:02 PM, "Torsten Bergh Moss" <[email protected]> wrote: Greetings everyone, I'm experimenting a lot with UDF's utilizing Neural Network inference, mainly for classification of tweets. Problem is, running the UDF's in a one-at-a-time fashion severely under-exploits the capacity of GPU-powered NN's, as well as there being a certain latency associated with moving data from the CPU to the GPU and back every time the UDF is called, causing for poor performance. Ideally it would be possible use the UDF to process records in a micro-batch fashion, letting them accumulate until a certain batch-size is reached (as big as my GPU's memory can handle) before passing the data along to the neural network to get the outputs. Is there a way to accomplish this with the current UDF framework (either in java or python)? If not, where would I have to start to develop such a feature? Best wishes, Torsten Bergh Moss
