[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

Benjamin Lerer (JIRA) Wed, 17 Dec 2014 13:35:44 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benjamin Lerer updated CASSANDRA-7016:
--------------------------------------
    Fix Version/s:     (was: 2.1.3)
                   3.0
           Labels: cql docs  (was: cql)

> can't map/reduce over subset of rows with cql
> ---------------------------------------------
>
>                 Key: CASSANDRA-7016
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7016
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core, Hadoop
>            Reporter: Jonathan Halliday
>            Assignee: Benjamin Lerer
>            Priority: Minor
>              Labels: cql, docs
>             Fix For: 3.0
>
>         Attachments: CASSANDRA-7016-V2.txt, CASSANDRA-7016-V3.txt, 
> CASSANDRA-7016.txt
>
>
> select ... where token(k) < x and token(k) >= y and k in (a,b) allow 
> filtering;
> This fails on 2.0.6: can't restrict k by more than one relation.
> In the context of map/reduce (hence the token range) I want to map over only 
> a subset of the keys (hence the 'in').  Pushing the 'in' filter down to cql 
> is substantially cheaper than pulling all rows to the client and then 
> discarding most of them.
> Currently this is possible only if the hadoop integration code is altered to 
> apply the AND on the client side and use cql that contains only the resulting 
> filtered 'in' set.  The problem is not hadoop specific though, so IMO it 
> should really be solved in cql not the hadoop integration code.
> Most restrictions on cql syntax seem to exist to prevent unduly expensive 
> queries. This one seems to be doing the opposite.
> Edit: on further thought and with reference to the code in 
> SelectStatement$RawStatement, it seems to me that  token(k) and k should be 
> considered distinct entities for the purposes of processing restrictions. 
> That is, no restriction on the token should conflict with a restriction on 
> the raw key. That way any monolithic query in terms of k and be decomposed 
> into parallel chunks over the token range for the purposes of map/reduce 
> processing simply by appending a 'and where token(k)...' clause to the 
> exiting 'where k ...'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7016) can't map/reduce over subset of rows with cql

Reply via email to