[jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)

Tyler Hobbs (JIRA) Tue, 08 Jul 2014 14:06:55 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055532#comment-14055532
 ]


Tyler Hobbs commented on CASSANDRA-7395:
----------------------------------------

Thanks for your work so far, Robert!  I haven't had time to look over your code 
changes in detail, but I'll try to answer some of your questions and add a few 
comments.

bq. Type parsing in C* is programmatically only possible from String to 
AbstractType. Parsing of CQL3 types is done by Cql.q, which "constructs" 
AbstractType. Is it ok to limit type names to the AbstractType syntax? Although 
I've added some simple "CQL3 parsing" using a CQL3Types.Native.valueOf()

I think it's okay to only classname-style types (e.g TypeParser) for now and 
defer support for cql-style types to another ticket.

bq. Shall UDFs support list/set/map/udf/tuple types - even nested types? It 
makes the current approach of using Java types in UDFs somewhat complicated. An 
intermediate solution might be to just pass the ByteBuffer - but that would not 
be consistent. Using list/set/map with primitive types is not a big deal. I 
think that these "high level" types are a bit "out of scope" of pure UDFs.

I disagree.  I prefer to start with a more general and powerful solution and 
work on friendliness later.  This means ByteBuffers get passed in.  Although 
it's not a friendly interface, it supports all Cassandra types out of the box.  
Additionally, it allows for some optimizations.  For example, a 
{{collectionLength()}} function could simply deserialize the first four bytes.

The best way to make this more user-friendly is debatable.  I don't feel like 
we need to go overboard on this right away, because UDFs (hopefully) aren't 
something you will be writing every day.

bq. Passing "any" type to a UDF (UDF gets a TypeAndData class instance that 
contains the AbstractType + ByteBuffer) would require to change the 
Function.execute(List<ByteBuffer>)) signature. Is this a feature worth that 
change? I'm a bit skeptical about the benefit of this feature.

I'm not sure about the utility of it either.  I don't think you would need to 
change the {{execute()}} signature, because the argument types (and hence the 
return type) for selectors can be determined when preparing.  Regardless, for 
the sake of keeping things simple, I would keep this out of v1.

bq. Is the approach to load UDF bundles (jar files) using a tool into C* 
system_udf keyspace ok?

Although what you have is pretty cool, I'm tempted to stick with the simpler 
approach of dropping jars in a directory and triggering a reload through 
JMX/nodetool.  That has far fewer security concerns, doesn't require a new 
system keyspace, and doesn't require additional tooling.  I don't feel like 
UDFs will change so frequently that the process for loading them needs to be 
super smooth.  The only time UDFs will change frequently is when developing, 
and that's usually done against a single local instance, where dropping in a 
jar is easy.

One other topic that I didn't see addressed by the changes (although I may have 
missed it) is dealing with existing prepared statements when a UDF changes.  I 
feel like we should invalidate existing prepared statements that use the UDF 
when that happens.  Drivers can handle that situation pretty seamlessly, and it 
would make it easier for operators to deploy a bugfix to a UDF.  I would be 
interested to hear other opinions, though.

Other than that, I personally like the bundle/namespace approach, and I think 
the annotations are a quite nice.

> Support for pure user-defined functions (UDF)
> ---------------------------------------------
>
>                 Key: CASSANDRA-7395
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7395
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>              Labels: cql
>             Fix For: 3.0
>
>         Attachments: 7395-v2.diff, 7395.diff
>
>
> We have some tickets for various aspects of UDF (CASSANDRA-4914, 
> CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of 
> ocean-boiling.
> Let's start with something simple: allowing pure user-defined functions in 
> the SELECT clause of a CQL query.  That's it.
> By "pure" I mean, must depend only on the input parameters.  No side effects. 
>  No exposure to C* internals.  Column values in, result out.  
> http://en.wikipedia.org/wiki/Pure_function



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)

Reply via email to