[
https://issues.apache.org/jira/browse/CASSANDRA-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044247#comment-14044247
]
graham sanderson edited comment on CASSANDRA-7449 at 6/26/14 1:25 AM:
----------------------------------------------------------------------
Note separately, a few things I noticed in passing
1) In CFDefinition - a bug that should be fixed separately if this feature
isn't implemented
{code}
public Collection<Name> staticColumns()
{
- return regularColumns.values();
+ return staticColumns.values();
}
{code}
2) If you insert an empty (zero byte) partition key value and you have a static
column, you get an assertion error on regular select * - I'll make a better
repro for this and file an issue
was (Author: graham sanderson):
Note separately, a few things I noticed in passing
1) In CFDefinition - a bug that should be fixed separately if this feature
isn't implemented
{code}
public Collection<Name> staticColumns()
{
- return regularColumns.values();
+ return staticColumns.values();
}
{code}
2) If you insert an empty (zero byte) partition key value and you have a static
column, you get an assertion error on select - I'll make a better repro for
this and file an issue
> Variation of SELECT DISTINCT to find clustering keys with only static columns
> -----------------------------------------------------------------------------
>
> Key: CASSANDRA-7449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7449
> Project: Cassandra
> Issue Type: Improvement
> Components: API, Core
> Reporter: graham sanderson
> Attachments: paging_broken_no_tests_v0.patch
>
>
> A possible use case for static columns involves (per partition) multiple
> small TTL time series data values combined with a potentially much larger
> static piece of data.
> While the TTL time series data will go away on its own, there is no way to
> TTL the static data (and keep it updated with the latest TTL) without
> re-inserting it every time to reset the TTL (which is undesirable since it is
> large and unchanged)
> The use case looks something like this:
> {code}
> CREATE KEYSPACE test WITH replication = {
> 'class': 'SimpleStrategy',
> 'replication_factor': '1'
> };
> USE test;
> CREATE TABLE expiring_series (
> id text,
> series_order int,
> small_data text,
> large_data text static,
> PRIMARY KEY (id, series_order)
> );
> INSERT INTO expiring_series (id, large_data) VALUES ('123', 'this is large
> and should not be inserted every time');
> INSERT INTO expiring_series (id, series_order, small_data) VALUES ('123', 1,
> 'antelope') USING TTL 120;
> // time passes (point A)
> INSERT INTO expiring_series (id, series_order, small_data) VALUES ('123', 2,
> 'gibbon') USING TTL 120;
> // time passes (point B)
> INSERT INTO expiring_series (id, series_order, small_data) VALUES ('123', 3,
> 'firebucket') USING TTL 120;
> // time passes (point C)
> // time passes and the first row expires (point D)
> // more time passes and eventually all the "rows" expire (point E)
> {code}
> GIven the way the storage engine works, there is no trivial way to make the
> static column expire when the last row expires, however if there was an easy
> way to find partitions with no regular rows (just static columns), then that
> would make manual clean up easy
> The possible implementation of such a feature is very similar to SELECT
> DISTINCT, so I'm suggesting SELECT STATICONLY
> Looking at the points again
> h4. Point A
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data)
> from expiring_series;
> id | series_order | small_data | large_data
> | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
> 123 | 1 | antelope | this is large and should not be inserted
> every time | 108
> (1 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point B
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data)
> from expiring_series;
> id | series_order | small_data | large_data
> | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
> 123 | 1 | antelope | this is large and should not be inserted
> every time | 87
> 123 | 2 | gibbon | this is large and should not be inserted
> every time | 111
> (2 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> id
> -----
> 123
> 123
> (2 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point C
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data)
> from expiring_series;
> id | series_order | small_data | large_data
> | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
> 123 | 1 | antelope | this is large and should not be inserted
> every time | 67
> 123 | 2 | gibbon | this is large and should not be inserted
> every time | 91
> 123 | 3 | firebucket | this is large and should not be inserted
> every time | 110
> (3 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> id
> -----
> 123
> 123
> 123
> (3 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point D
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data)
> from expiring_series;
> id | series_order | small_data | large_data
> | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
> 123 | 2 | gibbon | this is large and should not be inserted
> every time | 22
> 123 | 3 | firebucket | this is large and should not be inserted
> every time | 41
> (2 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> id
> -----
> 123
> 123
> (2 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
> h4. Point E
> {code}
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data)
> from expiring_series;
> id | series_order | small_data | large_data
> | ttl(small_data)
> -----+--------------+------------+-----------------------------------------------------+-----------------
> 123 | null | null | this is large and should not be inserted
> every time | null
> (1 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> (0 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> id
> -----
> 123
> (1 rows)
> {code}
> Notice that after the last id has expired SELECT STATICONLY id returns the
> row, and it can be deleted (under whatever concurrency rules the application
> needs)
> {code}
> cqlsh:test> DELETE FROM expiring_series where id = '123';
> cqlsh:test> SELECT id, series_order, small_data, large_data, ttl(small_data)
> from expiring_series;
> (0 rows)
> cqlsh:test> SELECT id FROM expiring_series;
> (0 rows)
> cqlsh:test> SELECT DISTINCT id FROM expiring_series;
> (0 rows)
> cqlsh:test> SELECT STATICONLY id FROM expiring_series;
> (0 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)