Re: Is this the correct data model thinking?

2012-02-28 Thread aaron morton
A.) store ALL the data associated with the user onto a single users row-key. Some user keys may be small, others may get larger over time depending upon activity. I would go with this. The important thing is supporting the read queries. Cheers Aaron - Aaron Morton Freelance

Re: sstable image/pic ?

2012-02-28 Thread aaron morton
On disk layout is described here, not sure how correct it is now days. http://wiki.apache.org/cassandra/ArchitectureSSTable There are multiple files involved, this will give you an idea of the read and write path http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ Hope that helps.

Re: TimeUUID

2012-02-28 Thread R. Verlangen
For querying purposes it would be better to use readable strings because you can really get information out of that. TimeUUID is just a unique value based on time; but not only the time. 2012/2/28 Tamar Fraenkel ta...@tok-media.com Hi! I have a column family where I use rows as time buckets.

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-28 Thread aaron morton
Have you tried lowering the batch size and increasing the time out? Even just to get it to work. If you get a TimedOutException it means CL number of servers did not respond in time. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On

Re: TimeUUID

2012-02-28 Thread aaron morton
not a great deal of difference, personally I would stick with seconds since epoch (it is probably slightly faster). Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/02/2012, at 7:24 PM, Tamar Fraenkel wrote: Hi! I have a column

Impact of old data on performance

2012-02-28 Thread Stefan Reek
Hi All, We are running a 3-node cluster with Cassandra 0.6.13. We are in the process of upgrading to 1.x, but can't do so for a while because we can't take the cluster offline. Until now 0.6.13 has run without problems, but lately we are getting some performance issues. We are getting

Re: sstable image/pic ?

2012-02-28 Thread Hontvári József Levente
* Does the column name get stored for every col/val for every key (which sort of worries me for long column names) Yes, the column name is stored with each value for every key, but it may not matter if you switch on compression, which AFAIK has only advantages and will be the default. I

Re: sstable image/pic ?

2012-02-28 Thread Franc Carter
2012/2/28 Hontvári József Levente hontv...@flyordie.com * Does the column name get stored for every col/val for every key (which sort of worries me for long column names) Yes, the column name is stored with each value for every key, but it may not matter if you switch on compression,

Re: TimeUUID

2012-02-28 Thread Tamar Fraenkel
Thanks, makes my life easier. Tamar On Tue, Feb 28, 2012 at 10:23 AM, aaron morton aa...@thelastpickle.comwrote: not a great deal of difference, personally I would stick with seconds since epoch (it is probably slightly faster). Cheers - Aaron Morton Freelance Developer

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-28 Thread Patrik Modesto
I'll alter these settings and will let you know. Regards, P. On Tue, Feb 28, 2012 at 09:23, aaron morton aa...@thelastpickle.com wrote: Have you tried lowering the  batch size and increasing the time out? Even just to get it to work. If you get a TimedOutException it means CL number of

GCInspecto​r causing slow writes?

2012-02-28 Thread Neil Dolling
When writing to Cassandra (v 1.0.7) I'm seeing ocasional delays of up to 4 seconds. Below is from the system.log where we are seeing the delays, is this a result of GC and is it worth me tuning these settings in order to fix? If so, any suggestions? adjusting memtable_total_space_in_mb? *DEBUG

Failed to join ring (NAT)

2012-02-28 Thread Richard Evans
I have a small ring of two nodes running successfully on aws. In order to understand cassandra support for NAT I have tried to add another node outside aws on a machine behind NAT. When I try to join the ring, there is a 30s pause after starting the messaging service and then it fails, unable to

Re: Server crashed due to OutOfMemoryError: Java heap space

2012-02-28 Thread Vitalii Tymchyshyn
Hello. Any messages about GC earlier in the logs? Cassandra server monitors memory and starts complaining in advance if memory gets full. Any chance you've got a full key delete-only scenario for some column families? Cassandra has a bug not being able to flush such memtables. I've filled a

Re: TimeUUID

2012-02-28 Thread Paul Loy
In a multi server env, to avoid key collisions timeuuid may be the better choice. On Monday, February 27, 2012, Tamar Fraenkel wrote: Hi! I have a column family where I use rows as time buckets. What I do is take epoc time in seconds, and round it to 1 hour (taking the result of

Re: TimeUUID

2012-02-28 Thread Dave Brosius
Given that these rows are wanted to be time buckets, you would want collisions, in fact that would be the standard way of working, so IMO, the uuid just removes the ability to bucket data and would not be wanted. On 02/28/2012 10:30 AM, Paul Loy wrote:

CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Philip Shon
I have not found any examples of utilizing a CompositeType of DynamicCompositeType as a row key. Is doing this frowned upon? All the examples I've seen have been using a CompositeType only for Column names (or values). My particular use case involves having the two components in the key being a

Re: CompositeType/DynamicCompositeType for Row Key

2012-02-28 Thread Chris Gerken
Phil, That's the problem with examples :) Row keys can be composite values. That works just fine. Was there something in particular you were trying to do? - Chris Chris Gerken chrisger...@mindspring.com 512.587.5261 http://www.linkedin.com/in/chgerken On Feb 28, 2012, at 10:25 AM,

Re: Impact of old data on performance

2012-02-28 Thread Dan Retzlaff
Hi Stefan. Can you share the output of nodetool cfstats? On Tue, Feb 28, 2012 at 1:50 AM, Stefan Reek ste...@unitedgames.com wrote: Hi All, We are running a 3-node cluster with Cassandra 0.6.13. We are in the process of upgrading to 1.x, but can't do so for a while because we can't take the

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
@Aaron: Are you suggesting 3 nodes (rather than 2) to allow quorum operations even at the temporary loss of 1 node from cluster's reach ? I understand this but I just another question popped up in my mind, probably since I'm not much experienced managing cassandra, so I'm unaware whether it may be

Re: Implications of length of column names

2012-02-28 Thread Maxim Potekhin
When I migrated data from our RDBMS, I hashed columns names to integers. This makes for some footwork, but the space gain is clearly there so it's worth it. I de-hash on read. Maxim On 2/10/2012 5:15 PM, Narendra Sharma wrote: It is good to have short column names. They save space all the

Re: Frequency of Flushing in 1.0

2012-02-28 Thread Xaero S
Thank you Aaron and others. That helped and we were able to limit the commitlog disk usage. We will be doing some tests by changing the memtable_total_space_in_mb param and see how that goes. On Mon, Feb 27, 2012 at 12:51 PM, aaron morton aa...@thelastpickle.comwrote: yes, reducing

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Maki Watanabe
If you have 3 nodes of RF=3, you can continue the service on cassandra even if one of the node will fail ( by hardware or software failure ). One other benefit is you can shutdown one node for maintenance or patch up without service interruption. If you run your service with 2 node and RF=2, your

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Maki Watanabe
If you run your service with 2 node and RF=2, your data will be replicated but your service will not be redundant. ( You can't stop both of nodes ) If your service doesn't need strong consistency ( allow cassandra returns old data after write, and possible write lost ), you can use CL=ONE for

Re: Using cassandra at minimal expenditures

2012-02-28 Thread Ertio Lew
Thanks, I think I don't need high consistency(as per my app requirements) so I might be fine with CL.ONE instead of quorum, so I think I'm probably going to be ok with a 2 node cluster initially.. Could you guys also recommend some minimum memory to start with ? Of course that would depend on my