Column Family per User

2012-04-18 Thread Trevor Francis
Our application has users that can write in upwards of 50 million records per day. However, they all write the same format of records (20 fields…columns). Should I put each user in their own column family, even though the column family schema will be the same per user? Would this help with

Re: Column Family per User

2012-04-18 Thread Janne Jalkanen
Each CF takes a fair chunk of memory regardless of how much data it has, so this is probably not a good idea, if you have lots of users. Also using a single CF means that compression is likely to work better (more redundant data). However, Cassandra distributes the load across different nodes

Re: Column Family per User

2012-04-18 Thread Trevor Francis
Janne, Of course, I am new to the Cassandra world, so it is taking some getting used to understand how everything translates into my MYSQL head. We are building an enterprise application that will ingest log information and provide metrics and trending based upon the data contained in the

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Your design should be around how you want to query. If you are only querying by user, then having a user as part of the row key makes sense. To manage row size, you should think of a row as being a bucket of time. Cassandra supports a large (but not without bounds) row size. To manage row size

Re: Column Family per User

2012-04-18 Thread Trevor Francis
for that time period. - Original Message - From: Trevor Francis trevor.fran...@tgrahamcapital.com Sent: Wed, April 18, 2012 15:48 Subject: Re: Column Family per User Janne, Of course, I am new to the Cassandra world, so it is taking some getting used to understand how

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Yes in this cassandra model, time wouldn't be a column value, it would be part of the column name. Depending on how you want to access your data (give me all data points for time X) and how many separate datapoints you have for time X, you might consider packing all the data for a time in one

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Yes in this cassandra model, time wouldn't be a column value, it would be part of the column name. Depending on how you want to access your data (give me all data points for time X) and how many separate datapoints you have for time X, you might consider packing all the data for a time in one

Re: Column Family per User

2012-04-18 Thread Trevor Francis
Regarding Rotating, I was thinking about the concept of log rotate, where you write to a file for a specific period of time, then you create a new file and write to it after a specific set of time. So yes, it closes a row and opens another row. Since I will be generating analytics every 15

Re: Column Family per User

2012-04-18 Thread Dave Brosius
It seems to me you are on the right track. Finding the right balance of # rows vs row width is the part that will take the most experimentation. - Original Message -From: quot;Trevor Francisquot; ;trevor.fran...@tgrahamcapital.com