[ https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056291#comment-14056291 ]
Robert Stupp commented on CASSANDRA-7395: ----------------------------------------- I like the approach to define (and code as supposed in CASSANDRA-7526) UDFs directly in CQL although it requires to add UDFs to the system keyspace and implicitly require schema agreement like tables, indexes, UDT etc. And if we agree that CASSANDRA-7526 is the way to do it right, then we must agree that Java 8 is required for C* 3.0 (except for the "pure Java" idea below). Using something like {{CREATE FUNCTION sum(a bigint, b bigint) AS ( return a + b; )}} is much easier to understand and to maintain than {{AS foo.bar.Class.method}}. Bundles could be implemented like this: {noformat} CREATE BUNDLE Math AS ( FUNCTION sum(a bigint, b bigint) { return a + b; } ); {noformat} But in opposite to use Nashorn in the first step, it would be possible to use "plain" Java code using [Apache BCEL|https://commons.apache.org/proper/commons-bcel/] which does not have the Java8 requirement. Adding the language as a parameter could look like {{FUNCTION sum(a bigint, b bigint) AS JAVA ...}} or {{AS JAVASCRIPT}} or Groovy or whatever. The _deterministic_ option was intended for use of UDFs in functional indexes - functional indexes require deterministic methods whereas "normal" execution does not require deterministic functions. So I'd like to keep this flag even in {{CREATE FUNCTION}} or {{CREATE BUNDLE ... FUNCTION}} syntax, but assume deterministic or non-deterministic as a default. As a conclusion a bundle in CQL syntax using BCEL could look like this: {noformat} CREATE OR UPDATE BUNDLE MyUDFs ( FUNCTION double sin(input double) AS JAVA { return input == null ? null : Math.sin(input); } FUNCTION float sin(input float) AS JAVA { return input == null ? null : Math.sin(input); } NON DETERMINISTIC FUNCTION double random() AS JAVA { return Math.random(); } ) {noformat} But we should keep some "backdoor" to pass the raw blob for a UDF - {{fooToBlob}} sounds straightforward, if it's cheap. If it's not cheap, it's just possible and if demand is there, we can add a special "raw" wildcard type for UDF parameters later. UDFs could be held in a table : {noformat} CREATE TABLE system.user_functions ( bundle text, -- bundle name signature text, -- function name + argument types ; might be a MD5 hash of these name text, -- function name arguments list<text>, -- list of CQL argument types return_type text, -- CQL return type language text, -- programming language body text -- code PRIMARY KEY ( ( bundle ), signature ) ); {noformat} Altogether this one does not expose internals to UDFs and using/porting {{DataType}} + {{TypeCodec}} + {{CassandraTypeParser}} from the Java Driver to parse "complex" CQL types is not a big deal - primitive types can be easily parsed using the {{CQL3Type.Native.valueOf(parsedTypeDef.toUpperCase())}}. As a "marketing bullet list" : * pure CQL functionality * no C* internals exposed * support for "pure Java" plus scripting languages * type raw representation support (using {{fooToBlob}}) * no periodic polling of filesystem or system tables * UDFs distributed "transparently" using schema agreement * no tooling necessary - cqlsh and everything that supports CQL is enough * UDF development help could be integrated for example in "DevCenter" that would itself compile a UDF bundle and allows test / execution of individual functions - since it's based on Eclipse it might be possible even to "debug" UDFs in Java and Nashorn supported scripting languages - but that's stuff for another ticket... * Access rules can be enforced using Java {{SecureClassLoader}} (UDF invocation surrounded with {{Thread.setContextClassLoader(...)}}) Drawbacks: * no official support to use external code * cluster schema agreement on UDFs necessary * changes of UDF bundles force compilation on each node - but that should not be a big issue since UDFs should be small and efficient - they are not "full blown libraries" I'm still not sure whether prepared statements must be invalidated if the bundle changes. As long as a UDF with the same signature exists execution can continue - and if the bundle/function is removed, execution will fail (which is ok). Yes - I really like the "pure CQL" idea - simple to understand - easy for users to start with - explanation would just need two bullet points on a slide. I think it's worth the BCEL and schema agreement effort. > Support for pure user-defined functions (UDF) > --------------------------------------------- > > Key: CASSANDRA-7395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7395 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Labels: cql > Fix For: 3.0 > > Attachments: 7395-v2.diff, 7395.diff > > > We have some tickets for various aspects of UDF (CASSANDRA-4914, > CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of > ocean-boiling. > Let's start with something simple: allowing pure user-defined functions in > the SELECT clause of a CQL query. That's it. > By "pure" I mean, must depend only on the input parameters. No side effects. > No exposure to C* internals. Column values in, result out. > http://en.wikipedia.org/wiki/Pure_function -- This message was sent by Atlassian JIRA (v6.2#6252)