[jira] [Comment Edited] (CASSANDRA-6412) Custom creation and merge functions for user-defined column types

Jeff Jirsa (JIRA) Tue, 07 Apr 2015 19:33:28 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484355#comment-14484355
 ]


Jeff Jirsa edited comment on CASSANDRA-6412 at 4/8/15 2:32 AM:
---------------------------------------------------------------

I'm playing with this, just to understand it conceptually, using CASSANDRA-8099 
as a base.

{noformat}
cqlsh> create keyspace test2 WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2}; use test2;
cqlsh:test2> select column_name, column_resolver from system.schema_columns 
where keyspace_name='test2' and columnfamily_name='table_with_resolvers';

 column_name | column_resolver
-------------+------------------------------------------------------------
       first | org.apache.cassandra.db.resolvers.ReverseTimestampResolver
        high |         org.apache.cassandra.db.resolvers.MaxValueResolver
          id |        org.apache.cassandra.db.resolvers.TimestampResolver
        last |        org.apache.cassandra.db.resolvers.TimestampResolver
         low |         org.apache.cassandra.db.resolvers.MinValueResolver

(5 rows)
cqlsh:test2> create table table_with_resolvers ( id text, low int with resolver 
'org.apache.cassandra.db.resolvers.MinValueResolver', high int with resolver 
'org.apache.cassandra.db.resolvers.MaxValueResolver', last int with resolver 
'org.apache.cassandra.db.resolvers.TimestampResolver', first int with resolver 
'org.apache.cassandra.db.resolvers.ReverseTimestampResolver', PRIMARY KEY(id));
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) 
values ('1', 1, 1, 1, 1);                                                       
                                                        
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) 
values ('1', 2, 2, 2, 2);
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) 
values ('1', 3, 3, 3, 3);
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) 
values ('1', 5, 5, 5, 5);
cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) 
values ('1', 4, 4, 4, 4);
cqlsh:test2> select * from table_with_resolvers;

 id | first | high | last | low
----+-------+------+------+-----
  1 |     1 |    5 |    4 |   1

(1 rows)
{noformat}

My diff/patch isn't fit for sharing at this time but as I'm going through, I 
had some questions: 

1) Given that user types are frozen, does it make sense to allow a resolver per 
field in user types, assuming that eventually user types will become un-frozen?
2) My initial pass disallows custom resolvers on counters and collections - 
does anyone have any strong opinion on whether or not user defined merge 
functions should be allowed for collections? 
3) Given that deletes are not commutative, I'm strongly considering making it 
so that built-in resolvers (min, max, first-write-wins, and default 
last-write-wins) simply always allow tombstones with a higher timestamp to take 
priority over anything else with a lower tombstone (that is, 
last-write-always-wins with tombstones). That works around SOME of the corner 
issues involving deletes - given that these are regular cells and have valid 
timestamps, does that not address some of the concern? 



was (Author: jjirsa):
I'm playing with this, just to understand it conceptually, using CASSANDRA-8099 
as a base.

{noformat}
cqlsh> create keyspace test2 WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 2}; use test2;
cqlsh:test2> create table table_with_resolvers ( id text, last text with 
resolver 'org.apache.cassandra.db.resolvers.TimestampResolver', first text with 
resolver 'org.apache.cassandra.db.resolvers.ReverseTimestampResolver', PRIMARY 
KEY(id));                                                                       
                                                                                
                                
cqlsh:test2> select column_name, column_resolver, type, validator  from 
system.schema_columns where keyspace_name='test2' and 
columnfamily_name='table_with_resolvers';

 column_name | column_resolver                                            | 
type          | validator
-------------+------------------------------------------------------------+---------------+------------------------------------------
       first | org.apache.cassandra.db.resolvers.ReverseTimestampResolver |     
  regular | org.apache.cassandra.db.marshal.UTF8Type
          id |        org.apache.cassandra.db.resolvers.TimestampResolver | 
partition_key | org.apache.cassandra.db.marshal.UTF8Type
        last |        org.apache.cassandra.db.resolvers.TimestampResolver |     
  regular | org.apache.cassandra.db.marshal.UTF8Type

cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', 
'1', '1');
cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', 
'2', '2');
cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', 
'3', '3');
cqlsh:test2> select * from table_with_resolvers ;

 id | first | last
----+-------+------
  1 |     1 |    3

(1 rows)
{noformat}

My diff/patch isn't fit for sharing at this time but as I'm going through, I 
had some questions: 

1) Given that user types are frozen, does it make sense to allow a resolver per 
field in user types, assuming that eventually user types will become un-frozen?
2) My initial pass disallows custom resolvers on counters and collections - 
does anyone have any strong opinion on whether or not user defined merge 
functions should be allowed for collections? 
3) Given that deletes are not commutative, I'm strongly considering making it 
so that built-in resolvers (min, max, first-write-wins, and default 
last-write-wins) simply always allow tombstones with a higher timestamp to take 
priority over anything else with a lower tombstone (that is, 
last-write-always-wins with tombstones). That works around SOME of the corner 
issues involving deletes - given that these are regular cells and have valid 
timestamps, does that not address some of the concern? 


> Custom creation and merge functions for user-defined column types
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-6412
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6412
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Nicolas Favre-Felix
>
> This is a proposal for a new feature, mapping custom types to Cassandra 
> columns.
> These types would provide a creation function and a merge function, to be 
> implemented in Java by the user.
> This feature relates to the concept of CRDTs; the proposal is to replicate 
> "operations" on these types during write, to apply these operations 
> internally during merge (Column.reconcile), and to also merge their values on 
> read.
> The following operations are made possible without reading back any data:
> * MIN or MAX(value) for a column
> * First value for a column
> * Count Distinct
> * HyperLogLog
> * Count-Min
> And any composition of these too, e.g. a Candlestick type includes first, 
> last, min, and max.
> The merge operations exposed by these types need to be commutative; this is 
> the case for many functions used in analytics.
> This feature is incomplete without some integration with CASSANDRA-4775 
> (Counters 2.0) which provides a Read-Modify-Write implementation for 
> distributed counters. Integrating custom creation and merge functions with 
> new counters would let users implement complex CRDTs in Cassandra, including:
> * Averages & related (sum of squares, standard deviation)
> * Graphs
> * Sets
> * Custom registers (even with vector clocks)
> I have a working prototype with implementations for min, max, and Candlestick 
> at https://github.com/acunu/cassandra/tree/crdts - I'd appreciate any 
> feedback on the design and interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-6412) Custom creation and merge functions for user-defined column types

Reply via email to