[jira] [Comment Edited] (CASSANDRA-7449) Variation of SELECT DISTINCT to find clustering keys with only static columns

graham sanderson (JIRA) Wed, 25 Jun 2014 18:27:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044247#comment-14044247
 ]


graham sanderson edited comment on CASSANDRA-7449 at 6/26/14 1:25 AM:
----------------------------------------------------------------------

Note separately, a few things I noticed in passing

1) In CFDefinition - a bug that should be fixed separately if this feature 
isn't implemented
{code}
     public Collection<Name> staticColumns()
     {
-        return regularColumns.values();
+        return staticColumns.values();
     }
{code}
2) If you insert an empty (zero byte) partition key value and you have a static 
column, you get an assertion error on regular select * - I'll make a better 
repro for this and file an issue


was (Author: graham sanderson):
Note separately, a few things I noticed in passing

1) In CFDefinition - a bug that should be fixed separately if this feature 
isn't implemented
{code}
     public Collection<Name> staticColumns()
     {
-        return regularColumns.values();
+        return staticColumns.values();
     }
{code}
2) If you insert an empty (zero byte) partition key value and you have a static 
column, you get an assertion error on select - I'll make a better repro for 
this and file an issue

> Variation of SELECT DISTINCT to find clustering keys with only static columns
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7449
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7449
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: graham sanderson
>         Attachments: paging_broken_no_tests_v0.patch
>
>
> A possible use case for static columns involves (per partition) multiple 
> small TTL time series data values combined with a potentially much larger 
> static piece of data.
> While the TTL time series data will go away on its own, there is no way to 
> TTL the static data (and keep it updated with the latest TTL) without 
> re-inserting it every time to reset the TTL (which is undesirable since it is 
> large and unchanged)
> The use case looks something like this:
> {code}
> CREATE KEYSPACE test WITH replication = {
>   'class': 'SimpleStrategy',
>   'replication_factor': '1'
> };
> USE test;
> CREATE TABLE expiring_series (
>       id text,
>       series_order int,
>       small_data text,
>       large_data text static,
>       PRIMARY KEY (id, series_order)
> );
> INSERT INTO expiring_series (id, large_data) VALUES ('123', 'this is large 
> and should not be inserted every time');
> INSERT INTO expiring_series (id, series_order, small_data) VALUES ('123', 1, 
> 'antelope') USING TTL 120;
> // time passes (point A)
> INSERT INTO expiring_series (id, series_order, small_data) VALUES ('123', 2, 
> 'gibbon') USING TTL 120;
> // time passes (point B)
> INSERT INTO expiring_series (id, series_order, small_data) VALUES ('123', 3, 
> 'firebucket') USING TTL 120;
> // time passes (point C)
> // time passes and the first row expires (point D)
> // more time passes and eventually all the "rows" expire (point E)
> {code}
> GIven the way the storage engine works, there is no trivial way to make the 
> static column expire when the last row expires, however if there was an easy 
> way to find partitions with no regular rows (just static columns), then that 
> would make manual clean up easy
> The possible implementation of such a feature is very similar to SELECT 
> DISTINCT, so I'm suggesting SELECT STATICONLY
> Looking at the points again
> h4. Point A
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data) 
> from expiring_series;
>  id  | series_order | small_data | large_data                                 
>          | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
>  123 |            1 |   antelope | this is large and should not be inserted 
> every time |             108
> (1 rows)
> cqlsh:test> SELECT id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point B
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data) 
> from expiring_series;
>  id  | series_order | small_data | large_data                                 
>          | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
>  123 |            1 |   antelope | this is large and should not be inserted 
> every time |              87
>  123 |            2 |     gibbon | this is large and should not be inserted 
> every time |             111
> (2 rows)
> cqlsh:test> SELECT id FROM expiring_series;
>  id
> -----
>  123
>  123
> (2 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point C
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data) 
> from expiring_series;
>  id  | series_order | small_data | large_data                                 
>          | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
>  123 |            1 |   antelope | this is large and should not be inserted 
> every time |              67
>  123 |            2 |     gibbon | this is large and should not be inserted 
> every time |              91
>  123 |            3 | firebucket | this is large and should not be inserted 
> every time |             110
> (3 rows)
> cqlsh:test> SELECT id FROM expiring_series;
>  id
> -----
>  123
>  123
>  123
> (3 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point D
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data) 
> from expiring_series;
>  id  | series_order | small_data | large_data                                 
>          | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
>  123 |            2 |     gibbon | this is large and should not be inserted 
> every time |              22
>  123 |            3 | firebucket | this is large and should not be inserted 
> every time |              41
> (2 rows)
> cqlsh:test> SELECT id FROM expiring_series;
>  id
> -----
>  123
>  123
> (2 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point E
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data) 
> from expiring_series;
>  id  | series_order | small_data | large_data                                 
>          | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
>  123 |         null |       null | this is large and should not be inserted 
> every time |            null
> (1 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> (0 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
>  id
> -----
>  123
> (1 rows)
> {code}
> Notice that after the last id has expired SELECT STATICONLY id returns the 
> row, and it can be deleted (under whatever concurrency rules the application 
> needs)
> {code}
> cqlsh:test> DELETE FROM expiring_series where id = '123';
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data) 
> from expiring_series;
> (0 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> (0 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> (0 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-7449) Variation of SELECT DISTINCT to find clustering keys with only static columns

Reply via email to