Re: Efficient model for a sorting

Vladimir Yudovin Tue, 04 Oct 2016 09:15:05 -0700

Hi Benjamin!

&gt;we now use CS 3.x and have been advised that 3.x is still not considered 
really production ready.


Did you consider using of 3.0.9? Actually it's 3.0 with almost an year fixes.


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




---- On Tue, 04 Oct 2016 07:27:54 -0400 Benjamin Roth 
&lt;benjamin.r...@jaumo.com&gt; wrote ---- 

Hi!


I have a frequently used pattern which seems to be quite costly in CS. The 
pattern is always the same: I have a unique key and a sorting by a different 
field.


To give an example, here a real life example from our model:
CREATE TABLE visits.visits_in (
    user_id int,
    user_id_visitor int,
    created timestamp,
    PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUSTERING ORDER BY (user_id_visitor ASC)



CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
    SELECT user_id, created, user_id_visitor
    FROM visits.visits_in
    WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor IS 
NOT NULL
    PRIMARY KEY (user_id, created, user_id_visitor)
    WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)


This simply represents people, that visited my profile sorted by date desc but 
only one entry per visitor.
Other examples with the same pattern could be a whats-app-like inbox where the 
last message of each sender is shown by date desc. There are lots of examples 
for that pattern.




E.g. in redis I'd just use a sorted set, where the key could be like 
"visits_${user_id}", set key would be user_id_visitor and score the created 
timestamp.

In MySQL I'd create the table with PK on user_id + user_id_visitor and create 
an index on user_id + created
In C* i use an MV.


Is this the most efficient approach?
I also could have done this without an MV but then the situation in our app 
would be far more complex.
I know that denormalization is a common pattern in C* and I don't hesitate to 
use it but in this case, it is not as simple as it's not an append-only case 
but updates have to be handled correctly.
If it is the first visit of a user, it's that simple, just 2 inserts in base 
table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd visit has 
to be deleted from the denormalized table before. Otherwise the visit would not 
be unique any more.
Handling this case without an MV requires a lot more effort, I guess even more 
effort than just using an MV. 
1. You need kind of app-side locking to deal with race conditions
2. Read before write is required to determine if an old record has to be deleted
3. At least CL_QUORUM is required to make sure that read before write is always 
consistent
4. Old record has to be deleted on update


I guess, using an MV here is more efficient as there is less roundtrip between 
C* and the app to do all that and the MV does not require strong consistency as 
MV updates are always local and are eventual consistent when the base table is. 
So there is also no need for distributed locks.


I ask all this as we now use CS 3.x and have been advised that 3.x is still not 
considered really production ready.


I guess in a perfect world, this wouldn't even require an MV if SASI indexes 
could be created over more than 1 column. E.g. in MySQL this case is nothing 
else than a BTree. AFAIK SASI indices are also BTrees, filtering by Partition 
Key (which should to be done anyway) and sorting by a field would perfectly do 
the trick. But from the docs, this is not possible right now.



Does anyone see a better solution or are all my assumptions correct?



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Efficient model for a sorting

Reply via email to