Re: Multi-column range scans

Ken Hancock Mon, 14 Jul 2014 08:04:22 -0700

I don't think your query is doing what he wants.  Your query will correctly
set the starting point, but will also return larger interval_id's but with
lower skill_levels:


cqlsh:test> select * from skill_count where skill='Complaints' and
(interval_id, skill_level) >= (1402359300000, 5);

 skill      | interval_id   | skill_level | skill_count
------------+---------------+-------------+-------------
 Complaints | 1402359300000 |           5 |          20
 Complaints | 1402359300000 |           8 |          30
 Complaints | 1402359300000 |          10 |           1
 Complaints | 1402359400000 |           2 |          10
 Complaints | 1402359400000 |           8 |          30

(5 rows)

cqlsh:test> select * from skill_count where skill='Complaints' and
(interval_id, skill_level) >= (1402359300000, 5) and (interval_id) <
(1402359900000);

 skill      | interval_id   | skill_level | skill_count
------------+---------------+-------------+-------------
 Complaints | 1402359300000 |           5 |          20  <- desired
 Complaints | 1402359300000 |           8 |          30  <- desired
 Complaints | 1402359300000 |          10 |           1  <- desired
 Complaints | 1402359400000 |           2 |          10  <- SKIP
 Complaints | 1402359400000 |           8 |          30  <- desired

The query results in a discontinuous range slice so isn't supported --
Essentially, the client will have to read the entire range and perform
client-side filtering.  Whether this is efficient depends on the
cardinality of skill_level.

I tried playing with the "allow filtering" cql clause, but it would appear
from the documentation it's very restrictive...





On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> or :
>
>
> select * from skill_count where skill='Complaints'
> and (interval_id,skill_level) >= (1402359300000,5)
> and (interval_id) < (1402359900000)
>
> Strange enough, when starting using tuple notation you'll need to stick to
> it even if there is only one element in the tuple
>
>
> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Sorry, I've just checked, the correct query should be:
>>
>> select * from skill_count where skill='Complaints' and
>> (interval_id,skill_level) >= (1402359300000,5) and
>> (interval_id,skill_level) < (1402359900000,11)
>>
>>
>> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> Hello Mathew
>>>
>>>  Since Cassandra 2.0.6 it is possible to query over composites:
>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>
>>> For your example:
>>>
>>> select * from skill_count where skill='Complaints' and
>>> (interval_id,skill_level) >= (1402359300000,5) and interval_id <
>>> 1402359900000;
>>>
>>>
>>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
>>> matthew.j.al...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a roll-up table that as follows.
>>>>
>>>> CREATE TABLE SKILL_COUNT (
>>>>   skill text,
>>>>   interval_id bigint,
>>>>   skill_level int,
>>>>   skill_count int,
>>>>   PRIMARY KEY (skill, interval_id, skill_level));
>>>>
>>>> Essentially,
>>>>   skill = a names skill i.e. "Complaints"
>>>>   interval_id = a rounded epoch time (15 minute intervals)
>>>>   skill_level = a number/rating from 1-10
>>>>   skill_count = the number of people with the specified skill, with the
>>>> specified skill level, logged in at the interval_id
>>>>
>>>> We'd like to run the following query against it
>>>>
>>>> select * from skill_count where skill='Complaints' and interval_id >=
>>>> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>>>>
>>>> to get a count of people with the relevant skill and level at the
>>>> appropriate time.  However I am getting the following message.
>>>>
>>>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>>>> (preceding part interval_id is either not restricted or by a non-EQ
>>>> relation)
>>>>
>>>> Looking at how the data is stored ...
>>>>
>>>> -------------------
>>>> RowKey: Complaints
>>>> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
>>>> => (name=1402359300000:2:skill_count, value=0000000a,
>>>> timestamp=1405308260403000)
>>>> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
>>>> => (name=1402359300000:5:skill_count, value=00000014,
>>>> timestamp=1405308260403001)
>>>> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
>>>> => (name=1402359300000:8:skill_count, value=0000001e,
>>>> timestamp=1405308260419000)
>>>> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
>>>> => (name=1402359300000:10:skill_count, value=00000001,
>>>> timestamp=1405308260419001)
>>>>
>>>> Should cassandra be able to allow for an extra level of filtering ? or
>>>> is this something that should be performed from within the application.
>>>>
>>>> We have a solution working in Oracle, but would like to store this data
>>>> in Cassandra, as all the other data that this solution relies on already
>>>> sits within Cassandra.
>>>>
>>>> Appreciate any guidance on this matter.
>>>>
>>>> Matt
>>>>
>>>
>>>
>>
>


-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
 <http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Multi-column range scans

Reply via email to