Re: implementing a 'sorted set' on top of cassandra

Jonathan Haddad Sat, 14 Jan 2017 09:45:38 -0800

Sorted sets don't have a requirement of incrementing / decrementing.
They're commonly used for thing like leaderboards where the values are
arbitrary.


In Redis they are implemented with 2 data structures for efficient lookups
of either key or value. No getting around that as far as I know.

In Cassandra they would require using the score as a clustering column in
order to select top N scores (and paginate). That means a tombstone
whenever the value for a key in the set changes. In sets with high rates of
change that means a lot of tombstones and thus terrible performance.
On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan <doanduy...@gmail.com> wrote:

> Sorting on an "incremented" numeric value has always been a nightmare to
> be done properly in C*
>
> Either use Counter type but then no sorting is possible since counter
> cannot be used as type for clustering column (which allows sort)
>
> Or use simple numeric type on clustering column but then to increment the
> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
> value + UPDATE ... IF value = <old_value>) + retry
>
>
>
> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth <benjamin.r...@jaumo.com>
> wrote:
>
> If your proposed solution is crazy depends on your needs :)
> It sounds like you can live with not-realtime data. So it is ok to cache
> it. Why preproduce the results if you only need 5% of them? Why not use
> redis as a cache with expiring sorted sets that are filled on demand from
> cassandra partitions with counters?
> So redis has much less to do and can scale much better. And you are not
> limited on keeping all data in ram as cache data is volatile and can be
> evicted on demand.
> If this is effective also depends on the size of your sets. CS wont be
> able to sort them by score for you, so you will have to load the complete
> set to redis for caching and / or do sorting in your app on demand. This
> certainly won't work out well with sets with millions of entries.
>
> 2017-01-13 23:14 GMT+01:00 Mike Torra <mto...@demandware.com>:
>
> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>    1. Use a counter CF to store the values I want to sort by
>    2. Periodically read in all key/values in the counter CF and sort in
>    the client application (~every five minutes or so)
>    3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>

Re: implementing a 'sorted set' on top of cassandra

Reply via email to