Our application has users that can write in upwards of 50 million records per
day. However, they all write the same format of records (20 fields…columns).
Should I put each user in their own column family, even though the column
family schema will be the same per user?
Would this help with
Each CF takes a fair chunk of memory regardless of how much data it has, so
this is probably not a good idea, if you have lots of users. Also using a
single CF means that compression is likely to work better (more redundant data).
However, Cassandra distributes the load across different nodes
Janne,
Of course, I am new to the Cassandra world, so it is taking some getting used
to understand how everything translates into my MYSQL head.
We are building an enterprise application that will ingest log information and
provide metrics and trending based upon the data contained in the
Your design should be around how you want to query. If you are only querying
by user, then having a user as part of the row key makes sense. To manage row
size, you should think of a row as being a bucket of time. Cassandra supports a
large (but not without bounds) row size. To manage row size
for that time period.
- Original Message -
From: Trevor Francis trevor.fran...@tgrahamcapital.com
Sent: Wed, April 18, 2012 15:48
Subject: Re: Column Family per User
Janne,
Of course, I am new to the Cassandra world, so it is taking some getting used
to understand how
Yes in this cassandra model, time wouldn't be a column value, it would be part
of the column name. Depending on how you want to access your data (give me all
data points for time X) and how many separate datapoints you have for time X,
you might consider packing all the data for a time in one
Yes in this cassandra model, time wouldn't be a column value, it would be
part of the column name. Depending on how you want to access your data (give me
all data points for time X) and how many separate datapoints you have for time
X, you might consider packing all the data for a time in one
Regarding Rotating, I was thinking about the concept of log rotate, where you
write to a file for a specific period of time, then you create a new file and
write to it after a specific set of time. So yes, it closes a row and opens
another row.
Since I will be generating analytics every 15
It seems to me you are on the right track. Finding the right balance of # rows
vs row width is the part that will take the most experimentation. -
Original Message -From: quot;Trevor Francisquot;
;trevor.fran...@tgrahamcapital.com