Unsuccessful attempt to add a second node to a ring.
Hi Everybody! I'm trying to add a second node to an already operating one node cluster. Some specs: - cassandra 1.0.7 - both nodes have a routable listen_address and rpc_address. - Ports are open: (from node2) telnet node1 7000 is successful - Seeds parameter on node2 points to node 1. [node1] nodetool -h localhost ring Address DC RackStatus State LoadOwns Token node1.ip datacenter1 rack1 Up Normal 74.33 KB100.00% 0 - initial token on node2 was specified I see something like that in the logs on node2: DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) collectTimeOrderedData INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: waiting for ring and schema information DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java (line 206) attempting to connect to NODE1/node1.ip DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) Disseminating load info ... INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: schema complete, ready to bootstrap DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got ring + schema info INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667) JOINING: getting bootstrap token DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token manually specified as 85070591730234615865843651857942052864 DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying mutation of row 4c but it doesn't join the ring: [node2] nodetool -h localhost ring Address DC RackStatus State LoadOwns Token node2.ip datacenter1 rack1 Up Normal 13.49 KB100.00% 85070591730234615865843651857942052864 I'm attaching the full log from node2 startup in debug mode. PS. When I didn't specified the initial token on node2 I ended up with exception like that: Exception encountered during startup: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. I'm not sure how to proceed now. I found a couple of posts with problems like that but they weren't very useful. -- regards, Jakub Glapa system.log Description: Binary data
Re: Unsuccessful attempt to add a second node to a ring.
Jakub, Have you set the Data, commitlog, saved cache directories to different ones in each yaml file for each node? Regards, Roshni From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Unsuccessful attempt to add a second node to a ring. Hi Everybody! I'm trying to add a second node to an already operating one node cluster. Some specs: - cassandra 1.0.7 - both nodes have a routable listen_address and rpc_address. - Ports are open: (from node2) telnet node1 7000 is successful - Seeds parameter on node2 points to node 1. [node1] nodetool -h localhost ring Address DC RackStatus State LoadOwns Token node1.ip datacenter1 rack1 Up Normal 74.33 KB100.00% 0 - initial token on node2 was specified I see something like that in the logs on node2: DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) collectTimeOrderedData INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: waiting for ring and schema information DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java (line 206) attempting to connect to NODE1/node1.ip DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) Disseminating load info ... INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: schema complete, ready to bootstrap DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got ring + schema info INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667) JOINING: getting bootstrap token DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token manually specified as 85070591730234615865843651857942052864 DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying mutation of row 4c but it doesn't join the ring: [node2] nodetool -h localhost ring Address DC RackStatus State LoadOwns Token node2.ip datacenter1 rack1 Up Normal 13.49 KB100.00% 85070591730234615865843651857942052864 I'm attaching the full log from node2 startup in debug mode. PS. When I didn't specified the initial token on node2 I ended up with exception like that: Exception encountered during startup: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster. Usually, this can be solved by giving all nodes the same seed list. I'm not sure how to proceed now. I found a couple of posts with problems like that but they weren't very useful. -- regards, Jakub Glapa This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Unsuccessful attempt to add a second node to a ring.
Hi Roshni, no they are the same, my changes in cassandra.yaml were only in the listen_address, rpc_address, seeds and initial_token field. The rest is exactly the same as on node1. That's how the file looks on node2: cluster_name: 'Test Cluster' initial_token: 85070591730234615865843651857942052864 hinted_handoff_enabled: true hinted_handoff_throttle_delay_in_ms: 1 authenticator: org.apache.cassandra.auth.AllowAllAuthenticator authority: org.apache.cassandra.auth.AllowAllAuthority partitioner: org.apache.cassandra.dht.RandomPartitioner data_file_directories: - /data/servers/cassandra_sbe_edtool/cassandra_data/data commitlog_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog saved_caches_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches commitlog_sync: periodic commitlog_sync_period_in_ms: 1 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: NODE1 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 32 memtable_flush_queue_size: 4 sliced_buffer_size_in_kb: 64 storage_port: 7000 ssl_storage_port: 7001 listen_address: NODE2 rpc_address: NODE2 rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: false snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 1 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch dynamic_snitch_update_interval_in_ms: 100 dynamic_snitch_reset_interval_in_ms: 60 dynamic_snitch_badness_threshold: 0.1 request_scheduler: org.apache.cassandra.scheduler.NoScheduler index_interval: 128 encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra -- regards, pozdrawiam, Jakub Glapa On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Jakub, Have you set the Data, commitlog, saved cache directories to different ones in each yaml file for each node? Regards, Roshni From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Unsuccessful attempt to add a second node to a ring. Hi Everybody! I'm trying to add a second node to an already operating one node cluster. Some specs: - cassandra 1.0.7 - both nodes have a routable listen_address and rpc_address. - Ports are open: (from node2) telnet node1 7000 is successful - Seeds parameter on node2 points to node 1. [node1] nodetool -h localhost ring Address DC RackStatus State Load OwnsToken node1.ip datacenter1 rack1 Up Normal 74.33 KB 100.00% 0 - initial token on node2 was specified I see something like that in the logs on node2: DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) collectTimeOrderedData INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: waiting for ring and schema information DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java (line 206) attempting to connect to NODE1/node1.ip DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) Disseminating load info ... INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: schema complete, ready to bootstrap DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got ring + schema info INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667) JOINING: getting bootstrap token DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token manually specified as 85070591730234615865843651857942052864 DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying mutation of row 4c but it doesn't join the ring: [node2] nodetool -h localhost ring Address DC RackStatus State Load OwnsToken node2.ip datacenter1 rack1 Up Normal 13.49 KB100.00% 85070591730234615865843651857942052864 I'm attaching the full log from node2 startup in debug mode. PS. When I didn't specified the initial token on node2 I ended up with exception like that: Exception encountered during startup: No other nodes seen! Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed. Otherwise, you need to determine
Re: Unsuccessful attempt to add a second node to a ring.
Ok, sorry it may not be required, I was thinking of a configuration I had done on my local laptop, where I had aliased my IP address. In that case the directories and jmx port needed to be different. Cluster name is same right? From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Unsuccessful attempt to add a second node to a ring. Hi Roshni, no they are the same, my changes in cassandra.yaml were only in the listen_address, rpc_address, seeds and initial_token field. The rest is exactly the same as on node1. That's how the file looks on node2: cluster_name: 'Test Cluster' initial_token: 85070591730234615865843651857942052864 hinted_handoff_enabled: true hinted_handoff_throttle_delay_in_ms: 1 authenticator: org.apache.cassandra.auth.AllowAllAuthenticator authority: org.apache.cassandra.auth.AllowAllAuthority partitioner: org.apache.cassandra.dht.RandomPartitioner data_file_directories: - /data/servers/cassandra_sbe_edtool/cassandra_data/data commitlog_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog saved_caches_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches commitlog_sync: periodic commitlog_sync_period_in_ms: 1 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: NODE1 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 32 memtable_flush_queue_size: 4 sliced_buffer_size_in_kb: 64 storage_port: 7000 ssl_storage_port: 7001 listen_address: NODE2 rpc_address: NODE2 rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: false snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 1 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch dynamic_snitch_update_interval_in_ms: 100 dynamic_snitch_reset_interval_in_ms: 60 dynamic_snitch_badness_threshold: 0.1 request_scheduler: org.apache.cassandra.scheduler.NoScheduler index_interval: 128 encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra -- regards, pozdrawiam, Jakub Glapa On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Jakub, Have you set the Data, commitlog, saved cache directories to different ones in each yaml file for each node? Regards, Roshni From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Unsuccessful attempt to add a second node to a ring. Hi Everybody! I'm trying to add a second node to an already operating one node cluster. Some specs: - cassandra 1.0.7 - both nodes have a routable listen_address and rpc_address. - Ports are open: (from node2) telnet node1 7000 is successful - Seeds parameter on node2 points to node 1. [node1] nodetool -h localhost ring Address DC RackStatus State LoadOwns Token node1.ip datacenter1 rack1 Up Normal 74.33 KB100.00% 0 - initial token on node2 was specified I see something like that in the logs on node2: DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) collectTimeOrderedData INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: waiting for ring and schema information DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java (line 206) attempting to connect to NODE1/node1.ip DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) Disseminating load info ... INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: schema complete, ready to bootstrap DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got ring + schema info INFO [main] 2012-07-31 13:51:08,642
Re: Unsuccessful attempt to add a second node to a ring.
yes it's the same -- regards, pozdrawiam, Jakub Glapa On Wed, Aug 1, 2012 at 11:24 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Ok, sorry it may not be required, I was thinking of a configuration I had done on my local laptop, where I had aliased my IP address. In that case the directories and jmx port needed to be different. Cluster name is same right? From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Unsuccessful attempt to add a second node to a ring. Hi Roshni, no they are the same, my changes in cassandra.yaml were only in the listen_address, rpc_address, seeds and initial_token field. The rest is exactly the same as on node1. That's how the file looks on node2: cluster_name: 'Test Cluster' initial_token: 85070591730234615865843651857942052864 hinted_handoff_enabled: true hinted_handoff_throttle_delay_in_ms: 1 authenticator: org.apache.cassandra.auth.AllowAllAuthenticator authority: org.apache.cassandra.auth.AllowAllAuthority partitioner: org.apache.cassandra.dht.RandomPartitioner data_file_directories: - /data/servers/cassandra_sbe_edtool/cassandra_data/data commitlog_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog saved_caches_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches commitlog_sync: periodic commitlog_sync_period_in_ms: 1 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: NODE1 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 32 memtable_flush_queue_size: 4 sliced_buffer_size_in_kb: 64 storage_port: 7000 ssl_storage_port: 7001 listen_address: NODE2 rpc_address: NODE2 rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: false snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 1 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch dynamic_snitch_update_interval_in_ms: 100 dynamic_snitch_reset_interval_in_ms: 60 dynamic_snitch_badness_threshold: 0.1 request_scheduler: org.apache.cassandra.scheduler.NoScheduler index_interval: 128 encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra -- regards, pozdrawiam, Jakub Glapa On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Jakub, Have you set the Data, commitlog, saved cache directories to different ones in each yaml file for each node? Regards, Roshni From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com mailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Unsuccessful attempt to add a second node to a ring. Hi Everybody! I'm trying to add a second node to an already operating one node cluster. Some specs: - cassandra 1.0.7 - both nodes have a routable listen_address and rpc_address. - Ports are open: (from node2) telnet node1 7000 is successful - Seeds parameter on node2 points to node 1. [node1] nodetool -h localhost ring Address DC RackStatus State Load OwnsToken node1.ip datacenter1 rack1 Up Normal 74.33 KB 100.00% 0 - initial token on node2 was specified I see something like that in the logs on node2: DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) collectTimeOrderedData INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: waiting for ring and schema information DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java (line 206) attempting to connect to NODE1/node1.ip DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) Disseminating load info ... INFO [main]
Restore snapshot
Hi, Is it possible to restore a snapshot of a keyspace on a live cassandra cluster (I mean without restarting)?
Re: Does Cassandra support operations in a transaction?
Hi Ivan, No Cassandra does not support transactions. I believe each operation is atomic. If that operation returns a successful result, then it worked. You can't do things like bind two operations and guarantee is either fails they both fail. You will find that Cassandra doesn't do a lot of things compared to a sql db :-) But, it does write a lot of data quickly. -g On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang wiwi1...@gmail.com wrote: Hi, I am a new guy to Cassandra, I wonder if available to call Cassandra in one Transaction such as in Relation-DB. Thanks in advance. Best Regards, Ivan Jiang
Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0
Just for information we are running on 1.1.2 JNA or not, had no difference Manually call full gc, had no difference but in my case the reduction of commitlog_total_space_in_mb to 2048 (from default 4096) makes the difference. On 07/26/2012 04:27 PM, Mina Naguib wrote: Hi Thomas On a modern 64bit server, I recommend you pay little attention to the virtual size. It's made up of almost everything within the process's address space, including on-disk files mmap()ed in for zero-copy access. It's not unreasonable for a machine with N amount RAM to have a process whose virtual size is several times the value of N. That in and of itself is not problematic In a default cassandra 1.1.x setup, the bulk of that will be your sstables' data and index files. On linux you can invoke the pmap tool on the cassandra process's PID to see what's in there. Much of it will be anonymous memory allocations (the JVM heap itself, off-heap data structures, etc), but lots of it will be references to files on disk (binaries, libraries, mmap()ed files, etc). What's more important to keep an eye on is the JVM heap - typically statically allocated to a fixed size at cassandra startup. You can get info about its used/capacity values via nodetool -h localhost info. You can also hook up jconsole and trend it over time. The other critical piece is the process's RESident memory size, which includes the JVM heap but also other off-heap data structures and miscellanea. Cassandra has recently been making more use of off-heap structures (for example, row caching via SerializingCacheProvider). This is done as a matter of efficiency - a serialized off-heap row is much smaller than a classical object sitting in the JVM heap - so you can do more with less. Unfortunately, in my experience, it's not perfect. They still have a cost, in terms of on-heap usage, as well as off-heap growth over time. Specifically, my experience with cassandra 1.1.0 showed that off-heap row caches incurred a very high on-heap cost (ironic) - see my post at http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E - as documented in that email, I managed that with regularly scheduled full GC runs via System.gc() I have, since then, moved away from scheduled System.gc() to scheduled row cache invalidations. While this had the same effect as System.gc() I described in my email, it eliminated the 20-30 second pause associated with it. It did however introduce (or may be I never noticed earlier), slow creep in memory usage outside of the heap. It's typical in my case for example for a process configured with 6G of JVM heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly throughout a week to 10-11GB range. Depending on what else the box is doing, I've experienced the linux OOM killer killing cassandra as you've described, or heavy swap usage bringing everything down (we're latency-sensitive), etc.. And now for the good news. Since I've upgraded to 1.1.2: 1. There's no more need for regularly scheduled System.gc() 2. There's no more need for regularly scheduled row cache invalidation 3. The HEAP usage within the JVM is stable over time 4. The RESident size of the process appears also stable over time Point #4 above is still pending as I only have 3 day graphs since the upgrade, but they show promising results compared to the slope of the same graph before the upgrade to 1.1.2 So my advice is give 1.1.2 a shot - just be mindful of https://issues.apache.org/jira/browse/CASSANDRA-4411 On 2012-07-26, at 2:18 AM, Thomas Spengler wrote: I saw this. All works fine upto version 1.1.0 the 0.8.x takes 5GB of memory of an 8GB machine the 1.0.x takes between 6 and 7 GB on a 8GB machine and the 1.1.0 takes all and it is a problem for me it is no solution to wait of the OOM-Killer from the linux kernel and restart the cassandraprocess when my machine has less then 100MB ram available then I have a problem. On 07/25/2012 07:06 PM, Tyler Hobbs wrote: Are you actually seeing any problems from this? High virtual memory usage on its own really doesn't mean anything. See http://wiki.apache.org/cassandra/FAQ#mmap On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler thomas.speng...@toptarif.de wrote: No one has any idea? we tryed update to 1.1.2 DiskAccessMode standard, indexAccessMode standard row_cache_size_in_mb: 0 key_cache_size_in_mb: 0 Our next try will to change SerializingCacheProvider to ConcurrentLinkedHashCacheProvider any other proposals are welcom On 07/04/2012 02:13 PM, Thomas Spengler wrote: Hi @all, since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage of the cassandra-nodes explodes our setup is: * 5 - centos 5.8 nodes * each 4 CPU's and 8 GB RAM * each node holds about
Re: Unsuccessful attempt to add a second node to a ring.
I found a similar thread from March : http://www.mail-archive.com/user@cassandra.apache.org/msg21007.html For me clearing the data and starting from the beginning didn't help. It's interesting because on my dev environment I was able to add another node without any problems. The only difference is that the second node now is in a different data center. (but I'm not using any different settings, SimpleSnitch) 7000,9160,7199 ports were open between those 2 nodes. How else can I check if the communication between those 2 nodes is working? In the logs I see that: DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java (line 206) attempting to connect to NODE1/node1.ip So I assume that the communication is somehow established? -- regards, Jakub Glapa On Wed, Aug 1, 2012 at 11:36 AM, Jakub Glapa jakub.gl...@gmail.com wrote: yes it's the same -- regards, pozdrawiam, Jakub Glapa On Wed, Aug 1, 2012 at 11:24 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Ok, sorry it may not be required, I was thinking of a configuration I had done on my local laptop, where I had aliased my IP address. In that case the directories and jmx port needed to be different. Cluster name is same right? From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Unsuccessful attempt to add a second node to a ring. Hi Roshni, no they are the same, my changes in cassandra.yaml were only in the listen_address, rpc_address, seeds and initial_token field. The rest is exactly the same as on node1. That's how the file looks on node2: cluster_name: 'Test Cluster' initial_token: 85070591730234615865843651857942052864 hinted_handoff_enabled: true hinted_handoff_throttle_delay_in_ms: 1 authenticator: org.apache.cassandra.auth.AllowAllAuthenticator authority: org.apache.cassandra.auth.AllowAllAuthority partitioner: org.apache.cassandra.dht.RandomPartitioner data_file_directories: - /data/servers/cassandra_sbe_edtool/cassandra_data/data commitlog_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog saved_caches_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches commitlog_sync: periodic commitlog_sync_period_in_ms: 1 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: NODE1 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 32 memtable_flush_queue_size: 4 sliced_buffer_size_in_kb: 64 storage_port: 7000 ssl_storage_port: 7001 listen_address: NODE2 rpc_address: NODE2 rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: false snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 1 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch dynamic_snitch_update_interval_in_ms: 100 dynamic_snitch_reset_interval_in_ms: 60 dynamic_snitch_badness_threshold: 0.1 request_scheduler: org.apache.cassandra.scheduler.NoScheduler index_interval: 128 encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra -- regards, pozdrawiam, Jakub Glapa On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Jakub, Have you set the Data, commitlog, saved cache directories to different ones in each yaml file for each node? Regards, Roshni From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com mailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Unsuccessful attempt to add a second node to a ring. Hi Everybody! I'm trying to add a second node to an already operating one node cluster. Some specs: - cassandra 1.0.7 - both nodes have a routable listen_address and
Re: Creating counter columns in cassandra
Hi All, I faced this same problem when trying to query the counter values. I am using a phone number as row key and updating the number of calls taken to that number. So my query is like SELECT KEY FROM columnFamily WHERE No_of_Calls5 This does not return any data, neither any exception, though I am 100% sure that entries are there which satisfy that query. I used same code as Amila mentioned. My doubt is this is due to some mismatch types with the counter value representation and query value, but failed to resolve this. :( Any ideas or guidance is greatly helpful. Thanks in advance! On Tue, Jul 31, 2012 at 1:49 PM, Amila Paranawithana amila1...@gmail.comwrote: Hi all, Thanks all for the valuable feedback. I have a problem with running queries with Cqlsh. My query is SELECT * FROM rule1 WHERE sms=3; java.lang.NumberFormatException: An hex string representing bytes must have an even length at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52) at org.apache.cassandra.utils.ByteBufferUtil.hexToBytes(ByteBufferUtil.java:501) at org.apache.cassandra.db.marshal.CounterColumnType.fromString(CounterColumnType.java:57) at org.apache.cassandra.cql.Term.getByteBuffer(Term.java:96) at org.apache.cassandra.cql.QueryProcessor.multiRangeSlice(QueryProcessor.java:185) at org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:484) at org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:877) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1235) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) but when I say SELECT * FROM rule1 WHERE sms=03; no exceptions are shown. But though I have entries where sms count =3 that entry is not retrieved. And for queries like SELECT * FROM rule1 WHERE sms=03; Bad Request: No indexed columns present in by-columns clause with equals operator Can anyone recognize the problem here?? Following are the methods I used. //for indexing columns void indexColumn(String idxColumnName,String CountercfName){ Cluster cluster = HFactory.getOrCreateCluster( BasicConf.CASSANDRA_CLUSTER, BasicConf.CLUSTER_PORT); KeyspaceDefinition keyspaceDefinition = cluster.describeKeyspace(BasicConf.KEYSPACE); ListColumnFamilyDefinition cdfs = keyspaceDefinition.getCfDefs(); ColumnFamilyDefinition cfd = null; for(ColumnFamilyDefinition c:cdfs){ if(c.getName().toString().equals(CountercfName)) { System.out.println(c.getName()); cfd=c; break; } } BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(cfd); BasicColumnDefinition bcdf = new BasicColumnDefinition(); bcdf.setName(StringSerializer.get().toByteBuffer(idxColumnName)); bcdf.setIndexName(idxColumnName+index); bcdf.setIndexType(ColumnIndexType.KEYS); bcdf.setValidationClass(ComparatorType.COUNTERTYPE.getClassName()); columnFamilyDefinition.addColumnDefinition(bcdf); cluster.updateColumnFamily(new ThriftCfDef(columnFamilyDefinition)); } // for adding a new counter column void insertCounterColumn(String cfName, String counterColumnName, String phoneNumberKey) { MutatorString mutator = HFactory.createMutator(keyspace, StringSerializer.get()); mutator.insertCounter(phoneNumberKey, cfName, HFactory .createCounterColumn(counterColumnName, 1L, StringSerializer.get())); mutator.execute(); CounterQueryString, String counter = new ThriftCounterColumnQueryString, String( keyspace, StringSerializer.get(), StringSerializer.get()); counter.setColumnFamily(cfName).setKey(phoneNumberKey) .setName(counterColumnName); indexColumn(columnName, cfName); } // incrementing counter values void incrementCounter(String ruleName, String columnName, HashMapString, Long entries) { MutatorString mutator = HFactory.createMutator(keyspace, StringSerializer.get()); SetString keys = entries.keySet(); for (String s : keys) { mutator.incrementCounter(s, ruleName, columnName, entries.get(s)); } mutator.execute(); } On Sun, Jul 29, 2012 at 3:29 PM, Paolo Bernardi berna...@gmail.comwrote: On Sun, Jul 29, 2012 at 9:30 AM, Abhijit Chanda abhijit.chan...@gmail.com wrote: There should be at least one =
Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0
Mina, Thanks for that post. Very interesting :-) What sort of things are you graphing? Standard *nux stuff (mem/cpu/etc)? Or do you have some hooks in to the C* process (I saw somoething about port 1414 in the .yaml file). Best, -g On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib mina.nag...@bloomdigital.com wrote: Hi Thomas On a modern 64bit server, I recommend you pay little attention to the virtual size. It's made up of almost everything within the process's address space, including on-disk files mmap()ed in for zero-copy access. It's not unreasonable for a machine with N amount RAM to have a process whose virtual size is several times the value of N. That in and of itself is not problematic In a default cassandra 1.1.x setup, the bulk of that will be your sstables' data and index files. On linux you can invoke the pmap tool on the cassandra process's PID to see what's in there. Much of it will be anonymous memory allocations (the JVM heap itself, off-heap data structures, etc), but lots of it will be references to files on disk (binaries, libraries, mmap()ed files, etc). What's more important to keep an eye on is the JVM heap - typically statically allocated to a fixed size at cassandra startup. You can get info about its used/capacity values via nodetool -h localhost info. You can also hook up jconsole and trend it over time. The other critical piece is the process's RESident memory size, which includes the JVM heap but also other off-heap data structures and miscellanea. Cassandra has recently been making more use of off-heap structures (for example, row caching via SerializingCacheProvider). This is done as a matter of efficiency - a serialized off-heap row is much smaller than a classical object sitting in the JVM heap - so you can do more with less. Unfortunately, in my experience, it's not perfect. They still have a cost, in terms of on-heap usage, as well as off-heap growth over time. Specifically, my experience with cassandra 1.1.0 showed that off-heap row caches incurred a very high on-heap cost (ironic) - see my post at http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E - as documented in that email, I managed that with regularly scheduled full GC runs via System.gc() I have, since then, moved away from scheduled System.gc() to scheduled row cache invalidations. While this had the same effect as System.gc() I described in my email, it eliminated the 20-30 second pause associated with it. It did however introduce (or may be I never noticed earlier), slow creep in memory usage outside of the heap. It's typical in my case for example for a process configured with 6G of JVM heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly throughout a week to 10-11GB range. Depending on what else the box is doing, I've experienced the linux OOM killer killing cassandra as you've described, or heavy swap usage bringing everything down (we're latency-sensitive), etc.. And now for the good news. Since I've upgraded to 1.1.2: 1. There's no more need for regularly scheduled System.gc() 2. There's no more need for regularly scheduled row cache invalidation 3. The HEAP usage within the JVM is stable over time 4. The RESident size of the process appears also stable over time Point #4 above is still pending as I only have 3 day graphs since the upgrade, but they show promising results compared to the slope of the same graph before the upgrade to 1.1.2 So my advice is give 1.1.2 a shot - just be mindful of https://issues.apache.org/jira/browse/CASSANDRA-4411 On 2012-07-26, at 2:18 AM, Thomas Spengler wrote: I saw this. All works fine upto version 1.1.0 the 0.8.x takes 5GB of memory of an 8GB machine the 1.0.x takes between 6 and 7 GB on a 8GB machine and the 1.1.0 takes all and it is a problem for me it is no solution to wait of the OOM-Killer from the linux kernel and restart the cassandraprocess when my machine has less then 100MB ram available then I have a problem. On 07/25/2012 07:06 PM, Tyler Hobbs wrote: Are you actually seeing any problems from this? High virtual memory usage on its own really doesn't mean anything. See http://wiki.apache.org/cassandra/FAQ#mmap On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler thomas.speng...@toptarif.de wrote: No one has any idea? we tryed update to 1.1.2 DiskAccessMode standard, indexAccessMode standard row_cache_size_in_mb: 0 key_cache_size_in_mb: 0 Our next try will to change SerializingCacheProvider to ConcurrentLinkedHashCacheProvider any other proposals are welcom On 07/04/2012 02:13 PM, Thomas Spengler wrote: Hi @all, since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage of the cassandra-nodes explodes our setup is: * 5 - centos 5.8 nodes * each 4
Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0
All our servers (cassandra and otherwise) get monitored with nagios + get many basic metrics graphed by pnp4nagios. This covers a large chunk of a box's health, as well as cassandra basics (specifically the pending tasks, JVM heap state). IMO it's not possible to clearly debug a cassandra issue if you don't have a good holistic view of the boxes' health (CPU, RAM, swap, disk throughput, etc.) Separate from that we have an operational dashboard. It's a bunch of manually-defined RRD files and custom scripts that grab metrics, store, and graph the health of various layers in the infrastructure in an an easy-to-digest way (for example, each data center gets a color scheme - stacked machines within multiple DCs can just be eyeballed). There we can see for example our total read volume, total write volume, struggling boxes, dynamic endpoint snitch reaction, etc... Finally, almost all the software we write integrates with statsd + graphite. In graphite we have more metrics than we know what to do with, but it's better than the other way around. From there for example we can see cassandra's response time including things cassandra itself can't measure (network, thrift, etc), across various different client softwares that talk to it. Within graphite we have several dashboards defined (users make their own, some infrastructure components have shared dashboards.) -- Mina Naguib :: Director, Infrastructure Engineering Bloom Digital Platforms :: T 514.394.7951 #208 http://bloom-hq.com/ On 2012-08-01, at 3:43 PM, Greg Fausak wrote: Mina, Thanks for that post. Very interesting :-) What sort of things are you graphing? Standard *nux stuff (mem/cpu/etc)? Or do you have some hooks in to the C* process (I saw somoething about port 1414 in the .yaml file). Best, -g On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib mina.nag...@bloomdigital.com wrote: Hi Thomas On a modern 64bit server, I recommend you pay little attention to the virtual size. It's made up of almost everything within the process's address space, including on-disk files mmap()ed in for zero-copy access. It's not unreasonable for a machine with N amount RAM to have a process whose virtual size is several times the value of N. That in and of itself is not problematic In a default cassandra 1.1.x setup, the bulk of that will be your sstables' data and index files. On linux you can invoke the pmap tool on the cassandra process's PID to see what's in there. Much of it will be anonymous memory allocations (the JVM heap itself, off-heap data structures, etc), but lots of it will be references to files on disk (binaries, libraries, mmap()ed files, etc). What's more important to keep an eye on is the JVM heap - typically statically allocated to a fixed size at cassandra startup. You can get info about its used/capacity values via nodetool -h localhost info. You can also hook up jconsole and trend it over time. The other critical piece is the process's RESident memory size, which includes the JVM heap but also other off-heap data structures and miscellanea. Cassandra has recently been making more use of off-heap structures (for example, row caching via SerializingCacheProvider). This is done as a matter of efficiency - a serialized off-heap row is much smaller than a classical object sitting in the JVM heap - so you can do more with less. Unfortunately, in my experience, it's not perfect. They still have a cost, in terms of on-heap usage, as well as off-heap growth over time. Specifically, my experience with cassandra 1.1.0 showed that off-heap row caches incurred a very high on-heap cost (ironic) - see my post at http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E - as documented in that email, I managed that with regularly scheduled full GC runs via System.gc() I have, since then, moved away from scheduled System.gc() to scheduled row cache invalidations. While this had the same effect as System.gc() I described in my email, it eliminated the 20-30 second pause associated with it. It did however introduce (or may be I never noticed earlier), slow creep in memory usage outside of the heap. It's typical in my case for example for a process configured with 6G of JVM heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly throughout a week to 10-11GB range. Depending on what else the box is doing, I've experienced the linux OOM killer killing cassandra as you've described, or heavy swap usage bringing everything down (we're latency-sensitive), etc.. And now for the good news. Since I've upgraded to 1.1.2: 1. There's no more need for regularly scheduled System.gc() 2. There's no more need for regularly scheduled row cache invalidation 3. The HEAP usage within the JVM is stable over time
Re: Does Cassandra support operations in a transaction?
Hi Greg, Thank you for your answers. I should have to convert my mind to NoSql from RD-SQL while using Cassandra. Best Regards, Ivan On Wed, Aug 1, 2012 at 9:20 PM, Greg Fausak g...@named.com wrote: Hi Ivan, No Cassandra does not support transactions. I believe each operation is atomic. If that operation returns a successful result, then it worked. You can't do things like bind two operations and guarantee is either fails they both fail. You will find that Cassandra doesn't do a lot of things compared to a sql db :-) But, it does write a lot of data quickly. -g On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang wiwi1...@gmail.com wrote: Hi, I am a new guy to Cassandra, I wonder if available to call Cassandra in one Transaction such as in Relation-DB. Thanks in advance. Best Regards, Ivan Jiang
Re: Does Cassandra support operations in a transaction?
Hi Ivan, Cassandra supports 'tunable consistency' . If you always read and write at a quorum (or local quorum for multi data center) from one , you can guarantee that the results will be consistent as in all the data will be compared and the latest will be returned, and no data will be out of date. This is at a loss of performance- it will be fastest to just read and write once rather than check a quorum of nodes. What you chose depends on what your application needs are. Is it ok if some users receive out of date data (it isn't earth shattering if someone doesn't know what you're eating right now), or is it a banking transaction system where all entities must be consistently updated. So designing in cassandra priortizes de-normalization. You cannot have referential integrity that 2 tables (col families in cassandra) are in sync because the database has designed it to be so using foreign keys. The application needs to ensure that all data in column families are accurate and not out of sync, because data elements may be duplicated in different col families. You cannot have 2 different entities and ensure that changes to both will be done and then only be visible to others. Regards, From: Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Does Cassandra support operations in a transaction? Short story is that few if any of the NoSql systems supprot transactions natively. Thats oen of the big compromises they make. What they call eventual consistancy is actually eventual Durabiltiy in ACID terms. Consistancy, as meant by the C in ACID, is not gauranteed at all. On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang wiwi1...@gmail.commailto:wiwi1...@gmail.com wrote: Hi, I am a new guy to Cassandra, I wonder if available to call Cassandra in one Transaction such as in Relation-DB. Thanks in advance. Best Regards, Ivan Jiang -- It's always darkest just before you are eaten by a grue. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Looking for a good Ruby client
Harry, we're in a similar situation and are starting to work out our own ruby client. The biggest issue is that it doesn't make much sense to build a higher level abstraction on anything other than CQL3, given where things are headed. At least this is our opinion. At the same time, CQL3 is just barely becoming usable and still seems rather deficient in wide-row usage. The tricky part is that with the current CQL3 you have to construct quite complex iterators to retrieve a large result set. Which means that you end up having to either parse CQL3 coming in to insert the iteration stuff, or you have to pass CQL3 fragments in and compose them together with iterator clauses. Not fun stuff either way. The only good solution I see is to switch to a streaming protocol (or build some form of continue on top of thrift) such that the client can ask for a huge result set and the cassandra coordinator can break it into sub-queries as it sees fit and return results chunk-by-chunk. If this is really the path forward then all abstractions built above CQL3 before that will either have a good piece of complex code that can be deleted or worse, will have an interface that is no longer best practice. Good luck! Thorsten On 8/1/2012 1:47 PM, Harry Wilkinson wrote: Hi, I'm looking for a Ruby client for Cassandra that is pretty high-level. I am really hoping to find a Ruby gem of high quality that allows a developer to create models like you would with ActiveModel. So far I have figured out that the canonical Ruby client for Cassandra is Twitter's Cassandra gem https://github.com/twitter/cassandra/ of the same name. It looks great - mature, still in active development, etc. No stated support for Ruby 1.9.3 that I can see, but I can probably live with that for now. What I'm looking for is a higher-level gem built on that gem that works like ActiveModel in that you just include a module in your model class and that gives you methods to declare your model's serialized attributes and also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc. I've been trying out some different NoSQL databases recently, and for example there is an official Ruby client https://github.com/basho/riak-ruby-client for Riak with a domain model that is close to Riak's, but then there's also a gem called 'Ripple' https://github.com/seancribbs/ripple that uses a domain model that is closer to what most Ruby developers are used to. So it looks like Twitter's Cassandra gem is the one that stays close to the domain model of Cassandra, and what I'm looking for is a gem that's a Cassandra equivalent of RIpple. From some searching I found cassandra_object https://github.com/NZKoz/cassandra_object, which has been inactive for a couple of years, but there's a fork https://github.com/data-axle/cassandra_object that looks like it's being maintained, but I have not found any kind of information to suggest the maintained fork is in general use yet. I have found quite a lot of gems of a similar style that people have started and then not really got very far with. So, does anybody know of a suitable gem? Would you recommend it? Or perhaps you would recommend not using such a gem and sticking with the lower-level client gem? Thanks in advance for your advice. Harry
Re: Does Cassandra support operations in a transaction?
Roshni, Thats not what consistancy in ACID means. Its not consistancy of reading the ame data, its referntial integrity between related pecies of data. Consistency Data is in a consistent state when a transaction starts and when it ends. For example, in an application that transfers funds from one account to another, the consistency property ensures that the total value of funds in both the accounts is the same at the start and end of each transaction. http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html A lot of people i nthe NoSql wqorld use the term consistancy when what they mean is durability. Durability After a transaction successfully completes, changes to data persist and are not undone, even in the event of a system failure. Many NoSql databses (includiogn Cassandra) are eventuallydurable, in the sense that a read immediately after a write may noit reflect that write, but at soem l;ater point, it will. None p[rovide true consistancy that I am aware of. : On Thu, Aug 2, 2012 at 12:24 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Hi Ivan, Cassandra supports 'tunable consistency' . If you always read and write at a quorum (or local quorum for multi data center) from one , you can guarantee that the results will be consistent as in all the data will be compared and the latest will be returned, and no data will be out of date. This is at a loss of performance- it will be fastest to just read and write once rather than check a quorum of nodes. What you chose depends on what your application needs are. Is it ok if some users receive out of date data (it isn't earth shattering if someone doesn't know what you're eating right now), or is it a banking transaction system where all entities must be consistently updated. So designing in cassandra priortizes de-normalization. You cannot have referential integrity that 2 tables (col families in cassandra) are in sync because the database has designed it to be so using foreign keys. The application needs to ensure that all data in column families are accurate and not out of sync, because data elements may be duplicated in different col families. You cannot have 2 different entities and ensure that changes to both will be done and then only be visible to others. Regards, From: Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Does Cassandra support operations in a transaction? Short story is that few if any of the NoSql systems supprot transactions natively. Thats oen of the big compromises they make. What they call eventual consistancy is actually eventual Durabiltiy in ACID terms. Consistancy, as meant by the C in ACID, is not gauranteed at all. On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang wiwi1...@gmail.commailto: wiwi1...@gmail.com wrote: Hi, I am a new guy to Cassandra, I wonder if available to call Cassandra in one Transaction such as in Relation-DB. Thanks in advance. Best Regards, Ivan Jiang -- It's always darkest just before you are eaten by a grue. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** -- It's always darkest just before you are eaten by a grue.
Re: Does Cassandra support operations in a transaction?
True consistancy, btw, pretty much is only possible in a transactional environment. On Thu, Aug 2, 2012 at 12:56 AM, Jeffrey Kesselman jef...@gmail.com wrote: Roshni, Thats not what consistancy in ACID means. Its not consistancy of reading the ame data, its referntial integrity between related pecies of data. Consistency Data is in a consistent state when a transaction starts and when it ends. For example, in an application that transfers funds from one account to another, the consistency property ensures that the total value of funds in both the accounts is the same at the start and end of each transaction. http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html A lot of people i nthe NoSql wqorld use the term consistancy when what they mean is durability. Durability After a transaction successfully completes, changes to data persist and are not undone, even in the event of a system failure. Many NoSql databses (includiogn Cassandra) are eventuallydurable, in the sense that a read immediately after a write may noit reflect that write, but at soem l;ater point, it will. None p[rovide true consistancy that I am aware of. : On Thu, Aug 2, 2012 at 12:24 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Hi Ivan, Cassandra supports 'tunable consistency' . If you always read and write at a quorum (or local quorum for multi data center) from one , you can guarantee that the results will be consistent as in all the data will be compared and the latest will be returned, and no data will be out of date. This is at a loss of performance- it will be fastest to just read and write once rather than check a quorum of nodes. What you chose depends on what your application needs are. Is it ok if some users receive out of date data (it isn't earth shattering if someone doesn't know what you're eating right now), or is it a banking transaction system where all entities must be consistently updated. So designing in cassandra priortizes de-normalization. You cannot have referential integrity that 2 tables (col families in cassandra) are in sync because the database has designed it to be so using foreign keys. The application needs to ensure that all data in column families are accurate and not out of sync, because data elements may be duplicated in different col families. You cannot have 2 different entities and ensure that changes to both will be done and then only be visible to others. Regards, From: Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Does Cassandra support operations in a transaction? Short story is that few if any of the NoSql systems supprot transactions natively. Thats oen of the big compromises they make. What they call eventual consistancy is actually eventual Durabiltiy in ACID terms. Consistancy, as meant by the C in ACID, is not gauranteed at all. On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang wiwi1...@gmail.commailto: wiwi1...@gmail.com wrote: Hi, I am a new guy to Cassandra, I wonder if available to call Cassandra in one Transaction such as in Relation-DB. Thanks in advance. Best Regards, Ivan Jiang -- It's always darkest just before you are eaten by a grue. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** -- It's always darkest just before you are eaten by a grue. -- It's always darkest just before you are eaten by a grue.