[jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)

Sylvain Lebresne (JIRA) Wed, 09 Jul 2014 03:56:24 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056100#comment-14056100
 ]


Sylvain Lebresne commented on CASSANDRA-7395:
---------------------------------------------

I'm sorry to disagree but I'd rather not depend on AbstractType/TypeParser, 
even at first. We shouldn't redo the mistakes of the past by making internal 
stuff part of the API. And AbstractType/TypeParser are very much internal 
classes. Also, regarding using ByteBuffer for arguments, while I've mentioned 
it myself initially, I don't think it's a good idea anymore, we should have 
proper java types right away if we can help it.

Which leads me to the following suggestion: reuse the java driver (we probably 
won't use a whole lot of stuff, probably mostly the DataType class and the 
related ones). It already knows about all CQL types with a well defined mapping 
to java types, handles collections, UDT, ... And its APIs are meant to be 
stable/public. And though we should ignore that for this ticket, we will be 
able to reuse the mapper for UDTs, which is neat.

The other point I'd like us to consider is the fact that this ticket is only a 
first step, but I'm strongly hoping we can get CASSANDRA-7526 not too long 
after this. So I think we should make as many concepts consistent between the 
two as possible. Which kind of mean not relying on java annotations but rather 
keep concepts in CQL as much as possible. Typically, we could support a syntax 
like:
{noformat}
CREATE FUNCTION sum (a bigint, b bigint) AS my.company.Functions.sum;
{noformat}
>From that, we'll just make sure the function pointed takes two integers and 
>return one.

Granted it's slightly less quick to define each functions that have them 
automatically defined from the class itself if you have crap-tons of them, but 
I don't think that's a big deal. 

If one wants to update a function definition, we can have a specific syntax
(<hint>follow-up ticket</hint>):
{noformat}
UPDATE FUNCTION sum (a bigint, b bigint) AS my.company.Functions.sum2;
{noformat}
Again, the advantages are that it's explicit and will neatly extend to
CASSANDRA-7526.

Having function definitions not be extraneous to CQL also mean that we can 
enforce some security rules (that is to have per-user rights to 
define/update/remove functions). This also makes it clear when say 
notifications should be sent for newly added functions, etc...

Regarding bundle/namespaces, I agree it's good to have some and I'm not sure 
what would be the best way to make them fit with what's above. Maybe it's 
enough to say that if you define a function {{Math.sum}} it's part of the 
{{Math}} namespace without having to define namespaces explicitely. Or maybe we 
want a specific syntax to create them, I'm not sure. But here again, I'd rather 
have us think about CASSANDRA-7526. Defining bundles by using java annotation 
will not work there and so I'd rather define the notion in CQL directly.

bq. Additionally, it allows for some optimizations. For example, a
collectionLength() function could simply deserialize the first four bytes.

While true, the number of optimizations that can be done without deserializing 
is not that numerous. I don't see very many outside of the length of 
collections in fact.  So I'd be fine just saying that we provide those out of 
the box (I'll note that for functions that want to work on the raw bytes of a 
value, the proper way to do it is to declare the function on blob, and to use 
the textAsBlob, intAsBlob, ... functions).

bq. @UDF(deterministic = false)

I'll note that imo we should ignore this entirely for UDFs: let's just execute 
every UDF at execution time (i.e. always use isPure() == false internally). The 
fact we have a distinction internally is questionable in the first place anyway.


> Support for pure user-defined functions (UDF)
> ---------------------------------------------
>
>                 Key: CASSANDRA-7395
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7395
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>              Labels: cql
>             Fix For: 3.0
>
>         Attachments: 7395-v2.diff, 7395.diff
>
>
> We have some tickets for various aspects of UDF (CASSANDRA-4914, 
> CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of 
> ocean-boiling.
> Let's start with something simple: allowing pure user-defined functions in 
> the SELECT clause of a CQL query.  That's it.
> By "pure" I mean, must depend only on the input parameters.  No side effects. 
>  No exposure to C* internals.  Column values in, result out.  
> http://en.wikipedia.org/wiki/Pure_function



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)

Reply via email to