[ https://issues.apache.org/jira/browse/CASSANDRA-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484355#comment-14484355 ]
Jeff Jirsa edited comment on CASSANDRA-6412 at 4/8/15 2:32 AM: --------------------------------------------------------------- I'm playing with this, just to understand it conceptually, using CASSANDRA-8099 as a base. {noformat} cqlsh> create keyspace test2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; use test2; cqlsh:test2> select column_name, column_resolver from system.schema_columns where keyspace_name='test2' and columnfamily_name='table_with_resolvers'; column_name | column_resolver -------------+------------------------------------------------------------ first | org.apache.cassandra.db.resolvers.ReverseTimestampResolver high | org.apache.cassandra.db.resolvers.MaxValueResolver id | org.apache.cassandra.db.resolvers.TimestampResolver last | org.apache.cassandra.db.resolvers.TimestampResolver low | org.apache.cassandra.db.resolvers.MinValueResolver (5 rows) cqlsh:test2> create table table_with_resolvers ( id text, low int with resolver 'org.apache.cassandra.db.resolvers.MinValueResolver', high int with resolver 'org.apache.cassandra.db.resolvers.MaxValueResolver', last int with resolver 'org.apache.cassandra.db.resolvers.TimestampResolver', first int with resolver 'org.apache.cassandra.db.resolvers.ReverseTimestampResolver', PRIMARY KEY(id)); cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1', 1, 1, 1, 1); cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1', 2, 2, 2, 2); cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1', 3, 3, 3, 3); cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1', 5, 5, 5, 5); cqlsh:test2> insert into table_with_resolvers (id, low, high, first, last ) values ('1', 4, 4, 4, 4); cqlsh:test2> select * from table_with_resolvers; id | first | high | last | low ----+-------+------+------+----- 1 | 1 | 5 | 4 | 1 (1 rows) {noformat} My diff/patch isn't fit for sharing at this time but as I'm going through, I had some questions: 1) Given that user types are frozen, does it make sense to allow a resolver per field in user types, assuming that eventually user types will become un-frozen? 2) My initial pass disallows custom resolvers on counters and collections - does anyone have any strong opinion on whether or not user defined merge functions should be allowed for collections? 3) Given that deletes are not commutative, I'm strongly considering making it so that built-in resolvers (min, max, first-write-wins, and default last-write-wins) simply always allow tombstones with a higher timestamp to take priority over anything else with a lower tombstone (that is, last-write-always-wins with tombstones). That works around SOME of the corner issues involving deletes - given that these are regular cells and have valid timestamps, does that not address some of the concern? was (Author: jjirsa): I'm playing with this, just to understand it conceptually, using CASSANDRA-8099 as a base. {noformat} cqlsh> create keyspace test2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}; use test2; cqlsh:test2> create table table_with_resolvers ( id text, last text with resolver 'org.apache.cassandra.db.resolvers.TimestampResolver', first text with resolver 'org.apache.cassandra.db.resolvers.ReverseTimestampResolver', PRIMARY KEY(id)); cqlsh:test2> select column_name, column_resolver, type, validator from system.schema_columns where keyspace_name='test2' and columnfamily_name='table_with_resolvers'; column_name | column_resolver | type | validator -------------+------------------------------------------------------------+---------------+------------------------------------------ first | org.apache.cassandra.db.resolvers.ReverseTimestampResolver | regular | org.apache.cassandra.db.marshal.UTF8Type id | org.apache.cassandra.db.resolvers.TimestampResolver | partition_key | org.apache.cassandra.db.marshal.UTF8Type last | org.apache.cassandra.db.resolvers.TimestampResolver | regular | org.apache.cassandra.db.marshal.UTF8Type cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', '1', '1'); cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', '2', '2'); cqlsh:test2> insert into table_with_resolvers (id, first, last ) values ('1', '3', '3'); cqlsh:test2> select * from table_with_resolvers ; id | first | last ----+-------+------ 1 | 1 | 3 (1 rows) {noformat} My diff/patch isn't fit for sharing at this time but as I'm going through, I had some questions: 1) Given that user types are frozen, does it make sense to allow a resolver per field in user types, assuming that eventually user types will become un-frozen? 2) My initial pass disallows custom resolvers on counters and collections - does anyone have any strong opinion on whether or not user defined merge functions should be allowed for collections? 3) Given that deletes are not commutative, I'm strongly considering making it so that built-in resolvers (min, max, first-write-wins, and default last-write-wins) simply always allow tombstones with a higher timestamp to take priority over anything else with a lower tombstone (that is, last-write-always-wins with tombstones). That works around SOME of the corner issues involving deletes - given that these are regular cells and have valid timestamps, does that not address some of the concern? > Custom creation and merge functions for user-defined column types > ----------------------------------------------------------------- > > Key: CASSANDRA-6412 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6412 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Nicolas Favre-Felix > > This is a proposal for a new feature, mapping custom types to Cassandra > columns. > These types would provide a creation function and a merge function, to be > implemented in Java by the user. > This feature relates to the concept of CRDTs; the proposal is to replicate > "operations" on these types during write, to apply these operations > internally during merge (Column.reconcile), and to also merge their values on > read. > The following operations are made possible without reading back any data: > * MIN or MAX(value) for a column > * First value for a column > * Count Distinct > * HyperLogLog > * Count-Min > And any composition of these too, e.g. a Candlestick type includes first, > last, min, and max. > The merge operations exposed by these types need to be commutative; this is > the case for many functions used in analytics. > This feature is incomplete without some integration with CASSANDRA-4775 > (Counters 2.0) which provides a Read-Modify-Write implementation for > distributed counters. Integrating custom creation and merge functions with > new counters would let users implement complex CRDTs in Cassandra, including: > * Averages & related (sum of squares, standard deviation) > * Graphs > * Sets > * Custom registers (even with vector clocks) > I have a working prototype with implementations for min, max, and Candlestick > at https://github.com/acunu/cassandra/tree/crdts - I'd appreciate any > feedback on the design and interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)