[ https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055532#comment-14055532 ]
Tyler Hobbs commented on CASSANDRA-7395: ---------------------------------------- Thanks for your work so far, Robert! I haven't had time to look over your code changes in detail, but I'll try to answer some of your questions and add a few comments. bq. Type parsing in C* is programmatically only possible from String to AbstractType. Parsing of CQL3 types is done by Cql.q, which "constructs" AbstractType. Is it ok to limit type names to the AbstractType syntax? Although I've added some simple "CQL3 parsing" using a CQL3Types.Native.valueOf() I think it's okay to only classname-style types (e.g TypeParser) for now and defer support for cql-style types to another ticket. bq. Shall UDFs support list/set/map/udf/tuple types - even nested types? It makes the current approach of using Java types in UDFs somewhat complicated. An intermediate solution might be to just pass the ByteBuffer - but that would not be consistent. Using list/set/map with primitive types is not a big deal. I think that these "high level" types are a bit "out of scope" of pure UDFs. I disagree. I prefer to start with a more general and powerful solution and work on friendliness later. This means ByteBuffers get passed in. Although it's not a friendly interface, it supports all Cassandra types out of the box. Additionally, it allows for some optimizations. For example, a {{collectionLength()}} function could simply deserialize the first four bytes. The best way to make this more user-friendly is debatable. I don't feel like we need to go overboard on this right away, because UDFs (hopefully) aren't something you will be writing every day. bq. Passing "any" type to a UDF (UDF gets a TypeAndData class instance that contains the AbstractType + ByteBuffer) would require to change the Function.execute(List<ByteBuffer>)) signature. Is this a feature worth that change? I'm a bit skeptical about the benefit of this feature. I'm not sure about the utility of it either. I don't think you would need to change the {{execute()}} signature, because the argument types (and hence the return type) for selectors can be determined when preparing. Regardless, for the sake of keeping things simple, I would keep this out of v1. bq. Is the approach to load UDF bundles (jar files) using a tool into C* system_udf keyspace ok? Although what you have is pretty cool, I'm tempted to stick with the simpler approach of dropping jars in a directory and triggering a reload through JMX/nodetool. That has far fewer security concerns, doesn't require a new system keyspace, and doesn't require additional tooling. I don't feel like UDFs will change so frequently that the process for loading them needs to be super smooth. The only time UDFs will change frequently is when developing, and that's usually done against a single local instance, where dropping in a jar is easy. One other topic that I didn't see addressed by the changes (although I may have missed it) is dealing with existing prepared statements when a UDF changes. I feel like we should invalidate existing prepared statements that use the UDF when that happens. Drivers can handle that situation pretty seamlessly, and it would make it easier for operators to deploy a bugfix to a UDF. I would be interested to hear other opinions, though. Other than that, I personally like the bundle/namespace approach, and I think the annotations are a quite nice. > Support for pure user-defined functions (UDF) > --------------------------------------------- > > Key: CASSANDRA-7395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7395 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Labels: cql > Fix For: 3.0 > > Attachments: 7395-v2.diff, 7395.diff > > > We have some tickets for various aspects of UDF (CASSANDRA-4914, > CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of > ocean-boiling. > Let's start with something simple: allowing pure user-defined functions in > the SELECT clause of a CQL query. That's it. > By "pure" I mean, must depend only on the input parameters. No side effects. > No exposure to C* internals. Column values in, result out. > http://en.wikipedia.org/wiki/Pure_function -- This message was sent by Atlassian JIRA (v6.2#6252)