MV is right now your best choice for this kind of sorting behavior. Secondary index (whatever the impl, SASI or Lucene) has a cost of scatter-gather if your cluster scale out. With MV you're at least guaranteed to hit a single node everytime
On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > Can you use the lucene index https://github.com/Stratio/ > cassandra-lucene-index ? > > On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth <benjamin.r...@jaumo.com> > wrote: > >> Hi! >> >> I have a frequently used pattern which seems to be quite costly in CS. >> The pattern is always the same: I have a unique key and a sorting by a >> different field. >> >> To give an example, here a real life example from our model: >> CREATE TABLE visits.visits_in ( >> user_id int, >> user_id_visitor int, >> created timestamp, >> PRIMARY KEY (user_id, user_id_visitor) >> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC) >> >> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS >> SELECT user_id, created, user_id_visitor >> FROM visits.visits_in >> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor >> IS NOT NULL >> PRIMARY KEY (user_id, created, user_id_visitor) >> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC) >> >> This simply represents people, that visited my profile sorted by date >> desc but only one entry per visitor. >> Other examples with the same pattern could be a whats-app-like inbox >> where the last message of each sender is shown by date desc. There are lots >> of examples for that pattern. >> >> E.g. in redis I'd just use a sorted set, where the key could be like >> "visits_${user_id}", set key would be user_id_visitor and score >> the created timestamp. >> In MySQL I'd create the table with PK on user_id + user_id_visitor and >> create an index on user_id + created >> In C* i use an MV. >> >> Is this the most efficient approach? >> I also could have done this without an MV but then the situation in our >> app would be far more complex. >> I know that denormalization is a common pattern in C* and I don't >> hesitate to use it but in this case, it is not as simple as it's not an >> append-only case but updates have to be handled correctly. >> If it is the first visit of a user, it's that simple, just 2 inserts in >> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd >> visit has to be deleted from the denormalized table before. Otherwise the >> visit would not be unique any more. >> Handling this case without an MV requires a lot more effort, I guess even >> more effort than just using an MV. >> 1. You need kind of app-side locking to deal with race conditions >> 2. Read before write is required to determine if an old record has to be >> deleted >> 3. At least CL_QUORUM is required to make sure that read before write is >> always consistent >> 4. Old record has to be deleted on update >> >> I guess, using an MV here is more efficient as there is less roundtrip >> between C* and the app to do all that and the MV does not require strong >> consistency as MV updates are always local and are eventual consistent when >> the base table is. So there is also no need for distributed locks. >> >> I ask all this as we now use CS 3.x and have been advised that 3.x is >> still not considered really production ready. >> >> I guess in a perfect world, this wouldn't even require an MV if SASI >> indexes could be created over more than 1 column. E.g. in MySQL this case >> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering >> by Partition Key (which should to be done anyway) and sorting by a field >> would perfectly do the trick. But from the docs, this is not possible right >> now. >> >> Does anyone see a better solution or are all my assumptions correct? >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 · Fax +49 7161 304880-1 >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> > >