[
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne resolved CASSANDRA-7888.
-----------------------------------------
Resolution: Not a Problem
I believe we've pretty much answered all the questions raised in this ticket in
other tickets (we don't use refection anymore for scalar functions, and we'll
define aggregate functions as a composition of existing scalar functions so
there is nothing specific to worry about for them). So closing this.
> Decide the best way to define user-define functions
> ---------------------------------------------------
>
> Key: CASSANDRA-7888
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Benjamin Lerer
> Labels: cql
> Fix For: 3.0
>
>
> The goal of this ticket is to define what would be the best way from the ease
> of use and performance point of view for defining User Defined Scalar
> Function and User Defined Aggregate Function.
> I would like to clarify this point before we add support for User Defined
> Aggregate Function as part of #4914
> The current version of UDF is supporting only the addition of Scalar Function
> and does so by allowing a User to provide some classes containing static
> methods that can then be loaded as functions within Cassandra.
> The problem with the static method approach is that it force us internally to
> perform a method call via reflection for each call of the function. So if the
> request load 10 000 rows the static method will be called 10 000 times via
> reflection.
> As the Method object is cached the HotSpot compiler will optimize the method
> call after a certain amount of iterations. Nevertheless, from a performance
> point of view it is definetly not a optimal situation.
> Ideally a proper solution from the performance point of view will limit the
> impact to the function loading time (when the function is first added or at
> startup time) but not at query time.
> The first solution to solve that problem would be to force the designer of a
> new function to implements a specific interface like:
> {code}
> public interface UserDefinedScalarFunction
> {
> Object execute(Object... args);
> }
> {code}
> or for aggregate function
> {code}
> public interface UserDefinedAggregateFunction
> {
> UserDefinedAggregation newAggregate();
> public interface UserDefinedAggregate
> {
> void add(Object... args);
> Object getResult();
> void reset();
> }
> }
> {code}
> This will allow use to create one object instance via reflection and then
> reuse that object everytime the function is called.
> The problems with that approach is that we loose the type safety of the
> arguments and of the return type and by consequence we will be able to detect
> a problem only at running time.
> The second solution would be to force the designer of a new function to
> create a new class in which it marks the method to execute with an annotation.
> {code}
> public class AbsFunction
> {
> @Execute
> public double abs(double d)
> {
> return Maths.abs(d);
> }
> }
> {code}
> The same approach for aggregate functions will give:
> {code}
> public class AvgFunction
> {
> private double sum;
> private int count
> @Add
> public void addValue(double d)
> {
> sum += d;
> count++;
> }
> @Get
> public double getAvg()
> {
> if (count == 0)
> return 0;
> return sum / count
> }
>
> @Reset
> public void clear()
> {
> sum = 0;
> count = 0;
> }
> }
> {code}
> For this approach to work we need to use, at loading time, code generation
> for extending the provided class with the method needed to adapt the class to
> our framework.
> The disavantage of it is that we will need to add a new library like
> javaassist to the libraries used by C*.
> Its advantage is that it will allow us to detect type mismatch at creation
> time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)