[
https://issues.apache.org/jira/browse/CASSANDRA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802845#comment-14802845
]
Sylvain Lebresne commented on CASSANDRA-10217:
----------------------------------------------
bq. The difference is that any custom expression which isn't used, is
effectively ignored and plays no part in filtering
I'm really not a fan. If we're gonna ignore something, let's not allow it.
Otherwise, it's really surprising for users (and arguably wrong) and we can't
change our mind later (if we decide ignoring wasn't the best thing to do)
without breaking backward compatibility. So I would have a rather strong
preference for only allowing 1 custom expression per query for now (again, if
only because it's easier to change our mind later to allow and ignore than the
reverse path).
Other remarks on the patch:
* I'd prefer getting the {{RowFilter.CustomExpression}} serialization as clean
as possible since we can't change that easily (without a messaging version
bump). Mostly, we have no use of either the column name or the operator, so we
shouldn't serialize them at all (to clarify, I don't care too much about using
fake values for them in the code itself, that can be changed later, I care
about not serializing those fake values).
* Seems we basically ignore custom expression when talking to pre-3.0 nodes,
which could end up with suprising (wrong) results if a use mistakenly use them
in a mixed-version cluster. I think I'd prefer throwing an exception saying you
need to upgrade all your nodes before doing that kind of queries.
* I'd prefer the {{toString()}} method of {{RowFilter.CustomExpression}} to
return something that looks like the original expression since that's what
other expression does (and it's sometimes included in some error message). So
really, just {{String.format("expr(%s, %s)", targetIndex.name,
UTF8Type.instance.getString(value))}}.
* Currently, it seems all validation of custom expression is left to the index
{{searcherFor}} method. That means that if a custom index does not support
custom expression, but a user use one on this index by mistake, the index could
be considered searchable while it's not, which I suspect could easily lead to
some random exception in the {{searchFor}} method. So I think it'd be better to
have index provide a method to say if they support custom expressions or not at
all. And while we could require a simple {{boolean
Index.supportCustomExpression()}} method, I have a slight preference for the
variant of my following point.
* We're currently assuming custom expressions are strings. It occurs to me that
we could allow any type by having the custom index actually tell us which type
it expects. That is, we'd add a {{customIndexValueType()}} method to {{Index}}.
And we can use that for my previous by having a {{null}} return simply mean
that custom index are not supported. I could imagine an index using a UDT as
type for instance. Not a big deal though, and I'd be fine leaving that for
later if you prefer, but sounds like an easy change that could be neat for some
custom implementations and that's we'll probably forget to ever implement if we
don't do it now.
> Support custom query expressions in SELECT
> ------------------------------------------
>
> Key: CASSANDRA-10217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10217
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Sam Tunnicliffe
> Assignee: Sam Tunnicliffe
> Fix For: 3.0.0 rc1
>
>
> (Broken out of CASSANDRA-10124)
> Custom index implementations often support query expressions which do not fit
> the structure of CQL. To support these, it has been necessary to add a fake
> column to the base table and query that using the custom syntax. Taking an
> example from the [Stratio
> docs|https://github.com/Stratio/cassandra-lucene-index]:
> {code}
> SELECT * FROM tweets WHERE lucene='{
> filter : {type:"range", field:"time", lower:"2014/04/25",
> upper:"2014/05/1"},
> query : {type:"phrase", field:"body", value:"big data gives
> organizations", slop:1}
> }'
> {code}
> The {{lucene}} field is a dummy column that has to be added to the table in
> order to associate the pre-3.0 row-based index with the {{tweets}} table. We
> could rewrite this query as:
> {code}
> SELECT * FROM tweets
> WHERE expr(lucene, '{filter : {type:"range", field:"time",
> lower:"2014/04/25", upper:"2014/05/1"},
> query : {type:"phrase", field:"body", value:"big data gives
> organizations", slop:1}}');
> {code}
> In this version the {{expr}} function takes 2 arguments: the first is the
> name of the index being targetted, {{lucene}} and the second is the query
> string itself.
> Parsing and validation of those expressions would be delegated to the custom
> index implementations which support them.
> One thing to consider is index selection. If a query contains custom
> expressions, but the target index is not selected, C* has no way to use the
> custom expressions as a post-query filter, like it does with standard
> expressions & {{ALLOW FILTERING}}. To compensate for that, index selection
> should be weighted in favour of indexes targetted by custom expressions. At
> least in the first instance, we should also restrict queries to targetting a
> single index via custom expressions, i.e. disallow queries like {{SELECT *
> FROM t WHERE expr(index1, 'foo') AND expr(index2, 'bar')}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)