Re: Counters question - is there a better way to count

2013-12-06 Thread Alex Popescu
On Thu, Dec 5, 2013 at 7:44 AM, Christopher Wirt chris.w...@struq.comwrote:

 I want to build a really simple column family which counts the occurrence
 of a single event X.


The guys from Disqus are big into counters:

https://www.youtube.com/watch?v=A2WdS0YQADo

http://www.slideshare.net/planetcassandra/cassandra-at-disqus (relevant
slides start at 25)


-- 

:- a)


Alex Popescu
@al3xandru


Counters question - is there a better way to count

2013-12-05 Thread Christopher Wirt
I want to build a really simple column family which counts the occurrence of
a single event X. 

 

Once we reach Y occurrences of X the counter resets to 0

 

The obvious way to do this is with a counter CF. 

 

CREATE TABLE xcounter1 (

id uuid,

someid int,

count counter

) PRIMARY KEY (uid, someid)

 

This is how I've always done it in the past, but I've been told to avoid
counters for various reasons, performance, consistency etc..

I'm not too bothered about 100% absolute consistency, however read
performance is certainly a big concern.

 

So I was thinking to avoid using counters I could do something like this.

 

CREATE TABLE xcounter2 (

id uuid,

someid int,

time timeuuid

) PRIMARY KEY (uid, someid, time)

 

Then retrieve all events and count in memory. Delete all id, someid records
once I hit Y.

 

Or I could 

CREATE TABLE xcounter3 (

id uuid,

someid int,

time timeuuid, 

Ycount int

) PRIMARY KEY (uid, someid, time)

 

Insert a 'Ycount' on each occurrence of the event. 

Only retrieve the last Y value inserted on reading

Then delete all records once I hit the magic Y value.

 

 

Anyone have any interesting thoughts or insight on what is likely to give me
the best read performance?

There will be 100's of someid to each id. Reads will be 5-10x the writes.

 

 

Thanks,

 

Chris



Re: Counters question - is there a better way to count

2013-12-05 Thread Andy Twigg
How many distinct uid,someid pairs will you have?
On Dec 5, 2013 3:44 PM, Christopher Wirt chris.w...@struq.com wrote:

 I want to build a really simple column family which counts the occurrence
 of a single event X.



 Once we reach Y occurrences of X the counter resets to 0



 The obvious way to do this is with a counter CF.



 CREATE TABLE xcounter1 (

 id uuid,

 someid int,

 count counter

 ) PRIMARY KEY (uid, someid)



 This is how I’ve always done it in the past, but I’ve been told to avoid
 counters for various reasons, performance, consistency etc..

 I’m not too bothered about 100% absolute consistency, however read
 performance is certainly a big concern.



 So I was thinking to avoid using counters I could do something like this.



 CREATE TABLE xcounter2 (

 id uuid,

 someid int,

 time timeuuid

 ) PRIMARY KEY (uid, someid, time)



 Then retrieve all events and count in memory. Delete all id, someid
 records once I hit Y.



 Or I could

 CREATE TABLE xcounter3 (

 id uuid,

 someid int,

 time timeuuid,

 Ycount int

 ) PRIMARY KEY (uid, someid, time)



 Insert a ‘Ycount’ on each occurrence of the event.

 Only retrieve the last Y value inserted on reading

 Then delete all records once I hit the magic Y value.





 Anyone have any interesting thoughts or insight on what is likely to give
 me the best read performance?

 There will be 100’s of someid to each id. Reads will be 5-10x the writes.





 Thanks,



 Chris



Re: Counters question - is there a better way to count

2013-12-05 Thread Przemek Maciolek
Some big systems using Cassandra's counters were built (such as Rainbird:
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011)
and seem to be doing great job.

If you are concerned with performance, then maybe using memory-based store
(such as Redis) will better suit your case (as long as it fits in the
memory, but considering the data model, I guess it might work).

If you are going to stick with Cassandra, then tweaking compaction
threshold can make a visible difference on the read performance, at least
from what I have seen. You can also consider changing the PRIMARY KEY to
((uid, someid), time) - this will make the partition key out of uid+someid,
rather than just someid. Depending on the access pattern, it might help.


On Thu, Dec 5, 2013 at 4:44 PM, Christopher Wirt chris.w...@struq.comwrote:

 I want to build a really simple column family which counts the occurrence
 of a single event X.



 Once we reach Y occurrences of X the counter resets to 0



 The obvious way to do this is with a counter CF.



 CREATE TABLE xcounter1 (

 id uuid,

 someid int,

 count counter

 ) PRIMARY KEY (uid, someid)



 This is how I’ve always done it in the past, but I’ve been told to avoid
 counters for various reasons, performance, consistency etc..

 I’m not too bothered about 100% absolute consistency, however read
 performance is certainly a big concern.



 So I was thinking to avoid using counters I could do something like this.



 CREATE TABLE xcounter2 (

 id uuid,

 someid int,

 time timeuuid

 ) PRIMARY KEY (uid, someid, time)



 Then retrieve all events and count in memory. Delete all id, someid
 records once I hit Y.



 Or I could

 CREATE TABLE xcounter3 (

 id uuid,

 someid int,

 time timeuuid,

 Ycount int

 ) PRIMARY KEY (uid, someid, time)



 Insert a ‘Ycount’ on each occurrence of the event.

 Only retrieve the last Y value inserted on reading

 Then delete all records once I hit the magic Y value.





 Anyone have any interesting thoughts or insight on what is likely to give
 me the best read performance?

 There will be 100’s of someid to each id. Reads will be 5-10x the writes.





 Thanks,



 Chris



RE: Counters question - is there a better way to count

2013-12-05 Thread Christopher Wirt
Hi Andy,

There will be 10's millions of uid each with 100's of someid being accessed
each day.

 

Hi Przemek, 

We currently use counter column families, but they are some of our slowest.
(they are also some of our biggest, so the counter type might not be the
issue)

 

We have a strong need for a cross DC solution. We could use redis and handle
the replication ourselves, but are hoping not to have to do this.

 

Regarding tweaking the compaction thresholds, so you mean
increase/decreasing the min/max _compaction_thresholds? I guess decreasing
both values will result in more compaction so fewer SSTable reads, so faster
reads? (at the cost of heavier cpu/disk usage?)

 

We will always require all of a uids, someid so adding someid to the
partition key is not an option at this time.

 

Thanks,

Chris

 

 

 

From: Przemek Maciolek [mailto:pmacio...@gmail.com] 
Sent: 05 December 2013 16:04
To: user@cassandra.apache.org
Subject: Re: Counters question - is there a better way to count

 

Some big systems using Cassandra's counters were built (such as Rainbird:
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-s
trata-2011 ) and seem to be doing great job.

 

If you are concerned with performance, then maybe using memory-based store
(such as Redis) will better suit your case (as long as it fits in the
memory, but considering the data model, I guess it might work).

 

If you are going to stick with Cassandra, then tweaking compaction threshold
can make a visible difference on the read performance, at least from what I
have seen. You can also consider changing the PRIMARY KEY to ((uid, someid),
time) - this will make the partition key out of uid+someid, rather than just
someid. Depending on the access pattern, it might help.

 

On Thu, Dec 5, 2013 at 4:44 PM, Christopher Wirt chris.w...@struq.com
wrote:

I want to build a really simple column family which counts the occurrence of
a single event X. 

 

Once we reach Y occurrences of X the counter resets to 0

 

The obvious way to do this is with a counter CF. 

 

CREATE TABLE xcounter1 (

id uuid,

someid int,

count counter

) PRIMARY KEY (uid, someid)

 

This is how I've always done it in the past, but I've been told to avoid
counters for various reasons, performance, consistency etc..

I'm not too bothered about 100% absolute consistency, however read
performance is certainly a big concern.

 

So I was thinking to avoid using counters I could do something like this.

 

CREATE TABLE xcounter2 (

id uuid,

someid int,

time timeuuid

) PRIMARY KEY (uid, someid, time)

 

Then retrieve all events and count in memory. Delete all id, someid records
once I hit Y.

 

Or I could 

CREATE TABLE xcounter3 (

id uuid,

someid int,

time timeuuid, 

Ycount int

) PRIMARY KEY (uid, someid, time)

 

Insert a 'Ycount' on each occurrence of the event. 

Only retrieve the last Y value inserted on reading

Then delete all records once I hit the magic Y value.

 

 

Anyone have any interesting thoughts or insight on what is likely to give me
the best read performance?

There will be 100's of someid to each id. Reads will be 5-10x the writes.

 

 

Thanks,

 

Chris