--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND contactname=yyyy ? Answer : Because a contact name can be duplicated. Or should I force unique contact names?
In this case, add contactid as extra clustering column to guarantee unicity for your contact. The delete query becomes: DELETE FROM user_contact WHERE userid=xxx AND contactname=yyyy AND contactid=zzz Normally, from the front-end (web apps or smartphone client) if you have the contactname, you SURELY also have the contactid information. Consequently, you can issue the above DELETE statement without having to read-before-delete, am I wrong ? On Tue, Jan 12, 2016 at 2:02 PM, I PVP <i...@hotmail.com> wrote: > --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND > contactname=yyyy ? > Answer : Because a contact name can be duplicated. Or should I force > unique contact names? > > Overall , the challenge seems to be addressed , with some trade of on the > "ordering by contact name”. > > If, at the base table, the clustering column is the objectid(timeuuid) > instead of the contactname , the DELETE will be based on userid = ? and > objectid = ?. > This works fine. Generic SELECTs will also work fine on the base table. > > The MV will serve SELECTs targeting/filtering the favorite contacts. > > Like this: > > CREATE TABLE communication.user_contact ( > userid int, > objectid timeuuid, > contactid int, > contactname text, > createdat timeuuid, > favoriteat timestamp, > isfavorite boolean, > PRIMARY KEY (userid, objectid) > ); > > > CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS > SELECT userid, isfavorite, objectid, contactid, contactname, createdat, > favoriteat > FROM user_contact > WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND objectid IS NOT > NULL > PRIMARY KEY ( ( userid, isfavorite ), objectid ) > WITH CLUSTERING ORDER BY ( objectid DESC ) ; > > > Unfortunately this approach forces the model to cluster by > objected(timeuuid) just to satisfy the need to DELETE a specific contact > row, and by doing that it wastes a opportunity on the MV, because all the > PKs from the base table need to be at the MV and it is not possible to set > the MV with with more than 1 non-PK column from the base table as the MV > PK. But still working fine. > > > That is my first Cassandra use case and the guidance provided by you guys > pretty important. > > Thanks very much for the answers, questions and suggestions. > > > -- > IPVP > > > From: DuyHai Doan <doanduy...@gmail.com> <doanduy...@gmail.com> > Reply: user@cassandra.apache.org <user@cassandra.apache.org>> > <user@cassandra.apache.org> > Date: January 12, 2016 at 10:27:45 AM > To: user@cassandra.apache.org <user@cassandra.apache.org>> > <user@cassandra.apache.org> > Cc: Jack Krupansky <jack.krupan...@gmail.com>> <jack.krupan...@gmail.com> > > Subject: Re: Modeling contact list, plain table or List > > 1)SELECT all rows from user_contact excluding the one that the user wants > to get rid of. > 2) DELETE all the user_contact rows for that particular user . > 3) INSERT the result of 1). > > --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND > contactname=yyyy ? > > The Materialized View will be automagically updated by Cassandra with a > query similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND > is_favorite=zzz AND contactname=yyy > > On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> In general I advise people avoid lists and use Maps or Sets instead. >> >> Using this data model, for instance, it's easy to remove a specific >> Address from a user: >> >> CREATE TYPE address ( >> street text, >> city text, >> zip_code int, >> ); >> >> CREATE TABLE user ( >> user_id int primary key, >> addresses map<text, frozen <address>> >> ); >> >> When I want to remove one of the addresses from a user, I can do this: >> >> cqlsh:test> delete addresses['home'] from user where user_id = 1; >> >> >> Hope that helps, >> Jon >> >> >> On Mon, Jan 11, 2016 at 1:20 PM I PVP <i...@hotmail.com> wrote: >> >>> Well…the way it is now it is not possible to delete a specific contact >>> row from the base table at all. Because a DELETE statement only works with >>> PK in the WHERE clausule. Non PK columns cannot be in the DELETE WHERE >>> clausule. >>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html >>> >>> The way it is now It is only possible to delete the entire contact list >>> for that specific user. >>> Looks like will need to: >>> 1)SELECT all rows from user_contact excluding the one that the user >>> wants to get rid of. >>> 2) DELETE all the user_contact rows for that particular user . >>> 3) INSERT the result of 1). >>> >>> Is that the proper way to achieve it or Am I missing some point in the >>> modeling that would allow to delete a specific contact row and still able >>> to comply with the select requirements? >>> >>> Thanks >>> -- >>> IPVP >>> >>> >>> From: Jack Krupansky <jack.krupan...@gmail.com> >>> <jack.krupan...@gmail.com> >>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>> >>> <user@cassandra.apache.org> >>> Date: January 11, 2016 at 7:00:04 PM >>> >>> To: user@cassandra.apache.org <user@cassandra.apache.org>> >>> <user@cassandra.apache.org> >>> Subject: Re: Modeling contact list, plain table or List >>> >>> That's the beauty of MV - Cassandra automatically updates the MVs when >>> the base table changes, including deletions, which is why all of the PK >>> columns from the base table needed to be in the MV PK. >>> >>> -- Jack Krupansky >>> >>> On Mon, Jan 11, 2016 at 3:41 PM, I PVP <i...@hotmail.com> wrote: >>> >>>> The below table and materialized view will solve the SELECT >>>> requirements of my current application . >>>> The challenge now is when the user decides to DELETE one specific >>>> contact from his contact list. I could add the objectid to a composite >>>> partition key together with the userid. But that would make the SELECT >>>> inviable. >>>> >>>> Any ideas/suggestions? >>>> >>>> >>>> CREATE TABLE communication.user_contact ( >>>> userid int, >>>> contactname text, >>>> contactid int, >>>> createdat timeuuid, >>>> favoriteat timestamp, >>>> isfavorite boolean, >>>> objectid timeuuid, >>>> PRIMARY KEY (userid, contactname) >>>> ) WITH CLUSTERING ORDER BY ( contactname DESC ) >>>> >>>> >>>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS >>>> SELECT userid, isfavorite, contactname, contactid, createdat, >>>> favoriteat, objectid >>>> FROM user_contact >>>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS >>>> NOT NULL >>>> PRIMARY KEY ( ( userid, isfavorite ), contactname ) >>>> WITH CLUSTERING ORDER BY ( contactname DESC ) >>>> >>>> Thanks >>>> >>>> -- >>>> IPVP >>>> >>>> >>>> From: DuyHai Doan <doanduy...@gmail.com> <doanduy...@gmail.com> >>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>> >>>> <user@cassandra.apache.org> >>>> Date: January 11, 2016 at 11:14:10 AM >>>> >>>> To: user@cassandra.apache.org <user@cassandra.apache.org>> >>>> <user@cassandra.apache.org> >>>> Subject: Re: Modeling contact list, plain table or List >>>> >>>> In the current iteration of materialized view, it is still not possible >>>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE >>>> won't work. >>>> >>>> Still there is a JIRA created to support this feature : >>>> https://issues.apache.org/jira/browse/CASSANDRA-10368 >>>> >>>> About cardinality of favorite vs non-favorites, it doesn't matter in this >>>> case because the OP said "Less then one hundred contacts by user is >>>> the normal." >>>> >>>> So even if all contacts are stuck in one unique favorite state, the >>>> materialized view partition for one user is at most 100. Even for extreme >>>> edge case with users having 10 000 contacts, it's still a manageable >>>> partition size for C*. >>>> >>>> But I agree it is important to know before-hand the >>>> favorite/non-favorite update frequency since it will impact the write >>>> throughput on the MV. >>>> >>>> For more details on materialized view impl and performance: >>>> http://www.doanduyhai.com/blog/?p=1930 >>>> >>>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky < >>>> jack.krupan...@gmail.com> wrote: >>>> >>>>> The new Materialized View feature is just an automated way of creating >>>>> and maintaining what people used to call a "query table", which is the >>>>> traditional Cassandra data modeling technique for performing queries on on >>>>> than the primary key for a table - you store the same columns in different >>>>> tables using different columns for the primary key. >>>>> >>>>> One also needs to be careful to include all columns of the original >>>>> primary key in each MV primary key - in addition to whatever column(s) are >>>>> to be used for indexing in each MV (so that Cassandra can find the old row >>>>> when it needs to update the MV when the base table row changes, such as on >>>>> a deletion.) >>>>> >>>>> But before creating MVs, you first need to answer questions about how >>>>> the app needs to query the data. Even with MV, conceptualizing queries >>>>> needs to precede data modeling. >>>>> >>>>> For example, what is the cardinality of favorites vs. non-favorites, >>>>> does the app even need to query by favorates, as opposed to querying all >>>>> contacts and retrieving is_favorite as simply a non-key column value, >>>>> whether favorites need to be retrieved separately from non-favorites, the >>>>> frequency and latency requirements for query by favorite status, etc. Once >>>>> these questions are answered, decisions can be made about data modeling. >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso <i...@mrcalonso.com> >>>>> wrote: >>>>> >>>>>> I have never used Materialized Views so maybe this suggestion is not >>>>>> possible, but in this case, wouldn't it make sense to define the >>>>>> materialized view as >>>>>> >>>>>> is_favourite IS TRUE >>>>>> instead of >>>>>> is_favourite IS NOT NULL? >>>>>> >>>>>> Carlos Alonso | Software Engineer | @calonso >>>>>> <https://twitter.com/calonso> >>>>>> >>>>>> On 10 January 2016 at 09:59, DuyHai Doan <doanduy...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Try this >>>>>>> >>>>>>> CREATE TABLE communication.user_contact_list ( >>>>>>> user_id uuid, >>>>>>> contact_id uuid, >>>>>>> contact_name text, >>>>>>> created_at timeuuid, >>>>>>> is_favorite boolean, >>>>>>> favorite_at timestamp, >>>>>>> PRIMARY KEY (user_id, contact_name, contact_id) >>>>>>> ); >>>>>>> >>>>>>> CREATE MATERIALIZED VIEW communication.user_favorite_contact_list >>>>>>> AS SELECT * FROM communication.user_contact_list >>>>>>> WHERE user_id IS NOT NULL AND contact_name IS NOT NULL >>>>>>> AND contact_id IS NOT NULL AND is_favorite IS NOT NULL >>>>>>> PRIMARY KEY(user_id, is_favorite, contact_name, contact_id) >>>>>>> >>>>>>> If the flag is_favorite is not updated very often the write perf hit >>>>>>> due to materialized view is acceptable. >>>>>>> >>>>>>> On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. <i...@hotmail.com> wrote: >>>>>>> >>>>>>>> Jack/ Michael, >>>>>>>> >>>>>>>> Thanks for answering. >>>>>>>> >>>>>>>> How big?: Less then one hundred contacts by user is the normal. >>>>>>>> >>>>>>>> Update requirements: The UPDATE requirements are all around each >>>>>>>> user “favoriting/unfavoriting” the contacts . Deleting is not very >>>>>>>> frequent. >>>>>>>> >>>>>>>> Does that mean that in C* 3.02 , for this use case to work, the >>>>>>>> contact name must be part of a composite partition key in order to >>>>>>>> allow >>>>>>>> sorting by contact_name like this ? : >>>>>>>> >>>>>>>> CREATE TABLE communication.user_contact_list ( >>>>>>>> user_id uuid, >>>>>>>> contact_name text, >>>>>>>> is_favorite boolean, >>>>>>>> contact_id uuid, >>>>>>>> created_at timeuuid, >>>>>>>> favorite_at timestamp, >>>>>>>> PRIMARY KEY ((user_id, contact_name), is_favorite) >>>>>>>> ) WITH CLUSTERING ORDER BY (contact_name ASC); >>>>>>>> >>>>>>>> Query: Select * from user_contact_list where user_id = :userid and >>>>>>>> is_favorite = true order by contact_name asc; >>>>>>>> >>>>>>>> Looks like each contact as a row/clustering key will be the way to >>>>>>>> go. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> IPVP >>>>>>>> >>>>>>>> >>>>>>>> From: Laing, Michael <michael.la...@nytimes.com> >>>>>>>> <michael.la...@nytimes.com> >>>>>>>> Reply: user@cassandra.apache.org <user@cassandra.apache.org>> >>>>>>>> <user@cassandra.apache.org> >>>>>>>> Date: January 9, 2016 at 11:51:27 AM >>>>>>>> To: user@cassandra.apache.org <user@cassandra.apache.org>> >>>>>>>> <user@cassandra.apache.org> >>>>>>>> Subject: Re: Modeling contact list, plain table or List >>>>>>>> >>>>>>>> Note that in C* 3.02 the second query is invalid: >>>>>>>> >>>>>>>> cqlsh> Select * from communication.user_contact_list where user_id >>>>>>>> = 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order >>>>>>>> by contact_name asc; >>>>>>>> >>>>>>>> *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY >>>>>>>> column "is_favorite" cannot be restricted as preceding column >>>>>>>> "contact_name" is not restricted"* >>>>>>>> >>>>>>>> On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky < >>>>>>>> jack.krupan...@gmail.com> wrote: >>>>>>>> >>>>>>>>> How big is each contact list expected to be? Dozens? Hundreds? >>>>>>>>> Thousands? If just dozens, a simple list column would seem >>>>>>>>> sufficient. If >>>>>>>>> thousands, the row (not partition) would get kind of bloated. >>>>>>>>> >>>>>>>>> What requirements do you have for updating? If updating contacts >>>>>>>>> and lots of contacts, I think I'd prefer each contact as a >>>>>>>>> row/clustering >>>>>>>>> key. Nice to be able to do selective queries to return slices of the >>>>>>>>> clustering key values, which is not so easy if they are all just a >>>>>>>>> single >>>>>>>>> list column. >>>>>>>>> >>>>>>>>> -- Jack Krupansky >>>>>>>>> >>>>>>>>> On Fri, Jan 8, 2016 at 6:31 PM, Isaac P. <i...@hotmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi everyone >>>>>>>>>> >>>>>>>>>> What would perform better while modeling a simple user contact >>>>>>>>>> list that will be used mainly to select the recipients for/from/to >>>>>>>>>> messages ? >>>>>>>>>> >>>>>>>>>> a) Individual rows to each (user, contact) pair so a select would >>>>>>>>>> fetch all the rows to retrieve all the contacts from a given user. >>>>>>>>>> >>>>>>>>>> or >>>>>>>>>> >>>>>>>>>> b) A single row for each user containing the List<Contact> UDT. >>>>>>>>>> >>>>>>>>>> Aside of the basic CRUD, the queries will be the following ones: >>>>>>>>>> >>>>>>>>>> Select * from user_contact_list where user_id = :userid order by >>>>>>>>>> contact_name asc >>>>>>>>>> >>>>>>>>>> Select * from user_contact_list where user_id = :userid and >>>>>>>>>> is_favorite = true order by contact_name asc >>>>>>>>>> >>>>>>>>>> After reading this >>>>>>>>>> https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html >>>>>>>>>> the table is looking like this: >>>>>>>>>> >>>>>>>>>> CREATE TABLE communication.user_contact_list ( >>>>>>>>>> user_id uuid, >>>>>>>>>> contact_id uuid, >>>>>>>>>> contact_name text, >>>>>>>>>> created_at timeuuid, >>>>>>>>>> is_favorite boolean, >>>>>>>>>> favorite_at timestamp, >>>>>>>>>> PRIMARY KEY (user_id, contact_name, is_favorite) >>>>>>>>>> ); >>>>>>>>>> >>>>>>>>>> Any guidance will be appreciated. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> IPVP >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >