Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
Hi Everybody!

I'm trying to add a second node to an already operating one node cluster.

Some specs:
- cassandra 1.0.7
- both nodes have a routable listen_address and rpc_address.
- Ports are open: (from node2) telnet node1 7000 is successful
- Seeds parameter on node2 points to node 1.

[node1] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
   Token
node1.ip datacenter1 rack1   Up Normal  74.33 KB100.00%
0

- initial token on node2 was specified

I see something like that in the logs on node2:

DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76)
collectTimeOrderedData
 INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667)
JOINING: waiting for ring and schema information
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
OutboundTcpConnection.java (line 206) attempting to connect to
NODE1/node1.ip
DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line
86) Disseminating load info ...
 INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667)
JOINING: schema complete, ready to bootstrap
DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got
ring + schema info
 INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667)
JOINING: getting bootstrap token
DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token
manually specified as 85070591730234615865843651857942052864
DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying
mutation of row 4c


but it doesn't join the ring:

[node2] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
   Token
node2.ip   datacenter1 rack1   Up Normal  13.49 KB100.00%
85070591730234615865843651857942052864



I'm attaching the full log from node2 startup in debug mode.



PS.
When I didn't specified the initial token on node2 I ended up with
exception like that:
Exception encountered during startup: No other nodes seen!  Unable to
bootstrap.If you intended to start a single-node cluster, you should make
sure your broadcast_address (or listen_address) is listed as a seed.
Otherwise, you need to determine why the seed being contacted has no
knowledge of the rest of the cluster.  Usually, this can be solved by
giving all nodes the same seed list.


I'm not sure how to proceed now. I found a couple of posts with problems
like that but they weren't very useful.

--
regards,
Jakub Glapa


system.log
Description: Binary data


Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Roshni Rajagopal
Jakub,

Have you set the
Data, commitlog, saved cache directories to different ones in each yaml file 
for each node?

Regards,
Roshni


From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Unsuccessful attempt to add a second node to a ring.

Hi Everybody!

I'm trying to add a second node to an already operating one node cluster.

Some specs:
- cassandra 1.0.7
- both nodes have a routable listen_address and rpc_address.
- Ports are open: (from node2) telnet node1 7000 is successful
- Seeds parameter on node2 points to node 1.

[node1] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
Token
node1.ip datacenter1 rack1   Up Normal  74.33 KB100.00% 0

- initial token on node2 was specified

I see something like that in the logs on node2:

DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) 
collectTimeOrderedData
 INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: 
waiting for ring and schema information
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java 
(line 206) attempting to connect to NODE1/node1.ip
DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) 
Disseminating load info ...
 INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: 
schema complete, ready to bootstrap
DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got 
ring + schema info
 INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667) JOINING: 
getting bootstrap token
DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token 
manually specified as 85070591730234615865843651857942052864
DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying mutation of 
row 4c


but it doesn't join the ring:

[node2] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
Token
node2.ip   datacenter1 rack1   Up Normal  13.49 KB100.00% 
85070591730234615865843651857942052864



I'm attaching the full log from node2 startup in debug mode.



PS.
When I didn't specified the initial token on node2 I ended up with exception 
like that:
Exception encountered during startup: No other nodes seen!  Unable to 
bootstrap.If you intended to start a single-node cluster, you should make sure 
your broadcast_address (or listen_address) is listed as a seed.
Otherwise, you need to determine why the seed being contacted has no knowledge 
of the rest of the cluster.  Usually, this can be solved by giving all nodes 
the same seed list.


I'm not sure how to proceed now. I found a couple of posts with problems like 
that but they weren't very useful.

--
regards,
Jakub Glapa

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
Hi Roshni,
no they are the same, my changes in cassandra.yaml were only in the
listen_address, rpc_address, seeds and initial_token field.
The rest is exactly the same as on node1.

That's how the file looks on node2:



cluster_name: 'Test Cluster'
initial_token: 85070591730234615865843651857942052864
hinted_handoff_enabled: true
hinted_handoff_throttle_delay_in_ms: 1
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /data/servers/cassandra_sbe_edtool/cassandra_data/data
commitlog_directory:
/data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
saved_caches_directory:
/data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: NODE1
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
sliced_buffer_size_in_kb: 64
storage_port: 7000
ssl_storage_port: 7001
listen_address: NODE2
rpc_address: NODE2
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 1
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra




--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com wrote:

 Jakub,

 Have you set the
 Data, commitlog, saved cache directories to different ones in each yaml
 file for each node?

 Regards,
 Roshni


 From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Unsuccessful attempt to add a second node to a ring.

 Hi Everybody!

 I'm trying to add a second node to an already operating one node cluster.

 Some specs:
 - cassandra 1.0.7
 - both nodes have a routable listen_address and rpc_address.
 - Ports are open: (from node2) telnet node1 7000 is successful
 - Seeds parameter on node2 points to node 1.

 [node1] nodetool -h localhost ring
 Address DC  RackStatus State   Load
  OwnsToken
 node1.ip datacenter1 rack1   Up Normal  74.33 KB
  100.00% 0

 - initial token on node2 was specified

 I see something like that in the logs on node2:

 DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76)
 collectTimeOrderedData
  INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667)
 JOINING: waiting for ring and schema information
 DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
 OutboundTcpConnection.java (line 206) attempting to connect to
 NODE1/node1.ip
 DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java
 (line 86) Disseminating load info ...
  INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667)
 JOINING: schema complete, ready to bootstrap
 DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ...
 got ring + schema info
  INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667)
 JOINING: getting bootstrap token
 DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token
 manually specified as 85070591730234615865843651857942052864
 DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying
 mutation of row 4c


 but it doesn't join the ring:

 [node2] nodetool -h localhost ring
 Address DC  RackStatus State   Load
  OwnsToken
 node2.ip   datacenter1 rack1   Up Normal  13.49 KB100.00%
 85070591730234615865843651857942052864



 I'm attaching the full log from node2 startup in debug mode.



 PS.
 When I didn't specified the initial token on node2 I ended up with
 exception like that:
 Exception encountered during startup: No other nodes seen!  Unable to
 bootstrap.If you intended to start a single-node cluster, you should make
 sure your broadcast_address (or listen_address) is listed as a seed.
 Otherwise, you need to determine 

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Roshni Rajagopal
Ok, sorry it may not be required,
I was thinking of a configuration I had done on my local laptop, where I had 
aliased my IP address.
In that case the directories and jmx port needed to be different.

Cluster name is same right?


From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Unsuccessful attempt to add a second node to a ring.

Hi Roshni,
no they are the same, my changes in cassandra.yaml were only in the 
listen_address, rpc_address, seeds and initial_token field.
The rest is exactly the same as on node1.

That's how the file looks on node2:



cluster_name: 'Test Cluster'
initial_token: 85070591730234615865843651857942052864
hinted_handoff_enabled: true
hinted_handoff_throttle_delay_in_ms: 1
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /data/servers/cassandra_sbe_edtool/cassandra_data/data
commitlog_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
saved_caches_directory: 
/data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: NODE1
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
sliced_buffer_size_in_kb: 64
storage_port: 7000
ssl_storage_port: 7001
listen_address: NODE2
rpc_address: NODE2
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 1
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra




--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote:
Jakub,

Have you set the
Data, commitlog, saved cache directories to different ones in each yaml file 
for each node?

Regards,
Roshni


From: Jakub Glapa 
jakub.gl...@gmail.commailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
Reply-To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Unsuccessful attempt to add a second node to a ring.

Hi Everybody!

I'm trying to add a second node to an already operating one node cluster.

Some specs:
- cassandra 1.0.7
- both nodes have a routable listen_address and rpc_address.
- Ports are open: (from node2) telnet node1 7000 is successful
- Seeds parameter on node2 points to node 1.

[node1] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
Token
node1.ip datacenter1 rack1   Up Normal  74.33 KB100.00% 0

- initial token on node2 was specified

I see something like that in the logs on node2:

DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) 
collectTimeOrderedData
 INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: 
waiting for ring and schema information
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java 
(line 206) attempting to connect to NODE1/node1.ip
DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) 
Disseminating load info ...
 INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: 
schema complete, ready to bootstrap
DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got 
ring + schema info
 INFO [main] 2012-07-31 13:51:08,642 

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
yes it's the same


--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Aug 1, 2012 at 11:24 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com wrote:

 Ok, sorry it may not be required,
 I was thinking of a configuration I had done on my local laptop, where I
 had aliased my IP address.
 In that case the directories and jmx port needed to be different.

 Cluster name is same right?


 From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Unsuccessful attempt to add a second node to a ring.

 Hi Roshni,
 no they are the same, my changes in cassandra.yaml were only in the
 listen_address, rpc_address, seeds and initial_token field.
 The rest is exactly the same as on node1.

 That's how the file looks on node2:



 cluster_name: 'Test Cluster'
 initial_token: 85070591730234615865843651857942052864
 hinted_handoff_enabled: true
 hinted_handoff_throttle_delay_in_ms: 1
 authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
 authority: org.apache.cassandra.auth.AllowAllAuthority
 partitioner: org.apache.cassandra.dht.RandomPartitioner
 data_file_directories:
 - /data/servers/cassandra_sbe_edtool/cassandra_data/data
 commitlog_directory:
 /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
 saved_caches_directory:
 /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 1
 seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
   - seeds: NODE1
 flush_largest_memtables_at: 0.75
 reduce_cache_sizes_at: 0.85
 reduce_cache_capacity_to: 0.6
 concurrent_reads: 32
 concurrent_writes: 32
 memtable_flush_queue_size: 4
 sliced_buffer_size_in_kb: 64
 storage_port: 7000
 ssl_storage_port: 7001
 listen_address: NODE2
 rpc_address: NODE2
 rpc_port: 9160
 rpc_keepalive: true
 rpc_server_type: sync
 thrift_framed_transport_size_in_mb: 15
 thrift_max_message_length_in_mb: 16
 incremental_backups: false
 snapshot_before_compaction: false
 column_index_size_in_kb: 64
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_throughput_mb_per_sec: 16
 compaction_preheat_key_cache: true
 rpc_timeout_in_ms: 1
 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
 dynamic_snitch_update_interval_in_ms: 100
 dynamic_snitch_reset_interval_in_ms: 60
 dynamic_snitch_badness_threshold: 0.1
 request_scheduler: org.apache.cassandra.scheduler.NoScheduler
 index_interval: 128
 encryption_options:
 internode_encryption: none
 keystore: conf/.keystore
 keystore_password: cassandra
 truststore: conf/.truststore
 truststore_password: cassandra




 --
 regards,
 pozdrawiam,
 Jakub Glapa


 On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com
 wrote:
 Jakub,

 Have you set the
 Data, commitlog, saved cache directories to different ones in each yaml
 file for each node?

 Regards,
 Roshni


 From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 mailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Unsuccessful attempt to add a second node to a ring.

 Hi Everybody!

 I'm trying to add a second node to an already operating one node cluster.

 Some specs:
 - cassandra 1.0.7
 - both nodes have a routable listen_address and rpc_address.
 - Ports are open: (from node2) telnet node1 7000 is successful
 - Seeds parameter on node2 points to node 1.

 [node1] nodetool -h localhost ring
 Address DC  RackStatus State   Load
  OwnsToken
 node1.ip datacenter1 rack1   Up Normal  74.33 KB
  100.00% 0

 - initial token on node2 was specified

 I see something like that in the logs on node2:

 DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76)
 collectTimeOrderedData
  INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667)
 JOINING: waiting for ring and schema information
 DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
 OutboundTcpConnection.java (line 206) attempting to connect to
 NODE1/node1.ip
 DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java
 (line 86) Disseminating load info ...
  INFO [main] 

Restore snapshot

2012-08-01 Thread Desimpel, Ignace
Hi,

Is it possible to restore a snapshot of a keyspace on a live cassandra cluster 
(I mean without restarting)?



Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Greg Fausak
Hi Ivan,

No Cassandra does not support transactions.

I believe each operation is atomic.  If that operation returns
a successful result, then it worked.  You can't do things like
bind two operations and guarantee is either fails they both fail.

You will find that Cassandra doesn't do a lot of things compared to a sql db :-)

But, it does write a lot of data quickly.

-g


On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang wiwi1...@gmail.com wrote:
 Hi,
 I am a new guy to Cassandra, I wonder if available to call Cassandra in
 one Transaction such as in Relation-DB.

 Thanks in advance.

 Best Regards,
 Ivan Jiang


Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Thomas Spengler
Just for information

we are running on 1.1.2
JNA or not, had no difference
Manually call full gc, had no difference

but
in my case

the reduction of
commitlog_total_space_in_mb to 2048 (from default 4096)
makes the difference.




On 07/26/2012 04:27 PM, Mina Naguib wrote:
 
 Hi Thomas
 
 On a modern 64bit server, I recommend you pay little attention to the virtual 
 size.  It's made up of almost everything within the process's address space, 
 including on-disk files mmap()ed in for zero-copy access.  It's not 
 unreasonable for a machine with N amount RAM to have a process whose virtual 
 size is several times the value of N.  That in and of itself is not 
 problematic
 
 In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
 data and index files.  On linux you can invoke the pmap tool on the 
 cassandra process's PID to see what's in there.  Much of it will be anonymous 
 memory allocations (the JVM heap itself, off-heap data structures, etc), but 
 lots of it will be references to files on disk (binaries, libraries, mmap()ed 
 files, etc).
 
 What's more important to keep an eye on is the JVM heap - typically 
 statically allocated to a fixed size at cassandra startup.  You can get info 
 about its used/capacity values via nodetool -h localhost info.  You can 
 also hook up jconsole and trend it over time.
 
 The other critical piece is the process's RESident memory size, which 
 includes the JVM heap but also other off-heap data structures and 
 miscellanea.  Cassandra has recently been making more use of off-heap 
 structures (for example, row caching via SerializingCacheProvider).  This is 
 done as a matter of efficiency - a serialized off-heap row is much smaller 
 than a classical object sitting in the JVM heap - so you can do more with 
 less.
 
 Unfortunately, in my experience, it's not perfect.  They still have a cost, 
 in terms of on-heap usage, as well as off-heap growth over time.
 
 Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
 caches incurred a very high on-heap cost (ironic) - see my post at 
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
  - as documented in that email, I managed that with regularly scheduled full 
 GC runs via System.gc()
 
 I have, since then, moved away from scheduled System.gc() to scheduled row 
 cache invalidations.  While this had the same effect as System.gc() I 
 described in my email, it eliminated the 20-30 second pause associated with 
 it.  It did however introduce (or may be I never noticed earlier), slow creep 
 in memory usage outside of the heap.
 
 It's typical in my case for example for a process configured with 6G of JVM 
 heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
 throughout a week to 10-11GB range.  Depending on what else the box is doing, 
 I've experienced the linux OOM killer killing cassandra as you've described, 
 or heavy swap usage bringing everything down (we're latency-sensitive), etc..
 
 And now for the good news.  Since I've upgraded to 1.1.2:
   1. There's no more need for regularly scheduled System.gc()
   2. There's no more need for regularly scheduled row cache invalidation
   3. The HEAP usage within the JVM is stable over time
   4. The RESident size of the process appears also stable over time
 
 Point #4 above is still pending as I only have 3 day graphs since the 
 upgrade, but they show promising results compared to the slope of the same 
 graph before the upgrade to 1.1.2
 
 So my advice is give 1.1.2 a shot - just be mindful of 
 https://issues.apache.org/jira/browse/CASSANDRA-4411
 
 
 On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:
 
 I saw this.

 All works fine upto version 1.1.0
 the 0.8.x takes 5GB of memory of an 8GB machine
 the 1.0.x takes between 6 and 7 GB on a 8GB machine
 and
 the 1.1.0 takes all

 and it is a problem
 for me it is no solution to wait of the OOM-Killer from the linux kernel
 and restart the cassandraprocess

 when my machine has less then 100MB ram available then I have a problem.



 On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
 Are you actually seeing any problems from this? High virtual memory usage
 on its own really doesn't mean anything. See
 http://wiki.apache.org/cassandra/FAQ#mmap

 On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler 
 thomas.speng...@toptarif.de wrote:

 No one has any idea?

 we tryed

 update to 1.1.2
 DiskAccessMode standard, indexAccessMode standard
 row_cache_size_in_mb: 0
 key_cache_size_in_mb: 0


 Our next try will to change

 SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

 any other proposals are welcom

 On 07/04/2012 02:13 PM, Thomas Spengler wrote:
 Hi @all,

 since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
 of the cassandra-nodes explodes

 our setup is:
 * 5 - centos 5.8 nodes
 * each 4 CPU's and 8 GB RAM
 * each node holds about 

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
I found a similar thread from March :
http://www.mail-archive.com/user@cassandra.apache.org/msg21007.html

For me clearing the data and starting from the beginning didn't help.

It's interesting because on my dev environment I was able to add another
node without any problems.

The only difference is that the second node now is in a different data
center. (but I'm not using any different settings, SimpleSnitch)
7000,9160,7199 ports were open between those 2 nodes.

How else can I check if the communication between those 2 nodes is working?
In the logs I see that:
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
OutboundTcpConnection.java (line 206) attempting to connect to
NODE1/node1.ip

So I assume that the communication is somehow established?


--
regards,
Jakub Glapa


On Wed, Aug 1, 2012 at 11:36 AM, Jakub Glapa jakub.gl...@gmail.com wrote:

 yes it's the same



 --
 regards,
 pozdrawiam,
 Jakub Glapa


 On Wed, Aug 1, 2012 at 11:24 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.com wrote:

 Ok, sorry it may not be required,
 I was thinking of a configuration I had done on my local laptop, where I
 had aliased my IP address.
 In that case the directories and jmx port needed to be different.

 Cluster name is same right?


 From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Unsuccessful attempt to add a second node to a ring.

 Hi Roshni,
 no they are the same, my changes in cassandra.yaml were only in the
 listen_address, rpc_address, seeds and initial_token field.
 The rest is exactly the same as on node1.

 That's how the file looks on node2:



 cluster_name: 'Test Cluster'
 initial_token: 85070591730234615865843651857942052864
 hinted_handoff_enabled: true
 hinted_handoff_throttle_delay_in_ms: 1
 authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
 authority: org.apache.cassandra.auth.AllowAllAuthority
 partitioner: org.apache.cassandra.dht.RandomPartitioner
 data_file_directories:
 - /data/servers/cassandra_sbe_edtool/cassandra_data/data
 commitlog_directory:
 /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
 saved_caches_directory:
 /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 1
 seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
   - seeds: NODE1
 flush_largest_memtables_at: 0.75
 reduce_cache_sizes_at: 0.85
 reduce_cache_capacity_to: 0.6
 concurrent_reads: 32
 concurrent_writes: 32
 memtable_flush_queue_size: 4
 sliced_buffer_size_in_kb: 64
 storage_port: 7000
 ssl_storage_port: 7001
 listen_address: NODE2
 rpc_address: NODE2
 rpc_port: 9160
 rpc_keepalive: true
 rpc_server_type: sync
 thrift_framed_transport_size_in_mb: 15
 thrift_max_message_length_in_mb: 16
 incremental_backups: false
 snapshot_before_compaction: false
 column_index_size_in_kb: 64
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_throughput_mb_per_sec: 16
 compaction_preheat_key_cache: true
 rpc_timeout_in_ms: 1
 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
 dynamic_snitch_update_interval_in_ms: 100
 dynamic_snitch_reset_interval_in_ms: 60
 dynamic_snitch_badness_threshold: 0.1
 request_scheduler: org.apache.cassandra.scheduler.NoScheduler
 index_interval: 128
 encryption_options:
 internode_encryption: none
 keystore: conf/.keystore
 keystore_password: cassandra
 truststore: conf/.truststore
 truststore_password: cassandra




 --
 regards,
 pozdrawiam,
 Jakub Glapa


 On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com
 wrote:
 Jakub,

 Have you set the
 Data, commitlog, saved cache directories to different ones in each yaml
 file for each node?

 Regards,
 Roshni


 From: Jakub Glapa jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 mailto:jakub.gl...@gmail.commailto:jakub.gl...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Unsuccessful attempt to add a second node to a ring.

 Hi Everybody!

 I'm trying to add a second node to an already operating one node cluster.

 Some specs:
 - cassandra 1.0.7
 - both nodes have a routable listen_address and 

Re: Creating counter columns in cassandra

2012-08-01 Thread Pushpalanka Jayawardhana
Hi All,

I faced this same problem when trying to query the counter values. I am
using a phone number as row key and updating the number of calls taken to
that number. So my query is like

SELECT KEY FROM columnFamily WHERE No_of_Calls5

This does not return any data, neither any exception, though I am 100% sure
that entries are there which satisfy that query.
I used same code as Amila mentioned. My doubt is this is due to some
mismatch types with the counter value representation and query value, but
failed to resolve this. :(

Any ideas or guidance is greatly helpful.
Thanks in advance!


On Tue, Jul 31, 2012 at 1:49 PM, Amila Paranawithana amila1...@gmail.comwrote:

 Hi all,
 Thanks all for the valuable feedback. I have a problem with running
 queries with Cqlsh.
 My query is  SELECT * FROM rule1 WHERE sms=3;

 java.lang.NumberFormatException: An hex string representing bytes must
 have an even length
  at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52)
 at
 org.apache.cassandra.utils.ByteBufferUtil.hexToBytes(ByteBufferUtil.java:501)
  at
 org.apache.cassandra.db.marshal.CounterColumnType.fromString(CounterColumnType.java:57)
  at org.apache.cassandra.cql.Term.getByteBuffer(Term.java:96)
 at
 org.apache.cassandra.cql.QueryProcessor.multiRangeSlice(QueryProcessor.java:185)
  at
 org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:484)
 at org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:877)
  at
 org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1235)
  at
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
  at
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
  at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)

 but when I say SELECT * FROM rule1 WHERE sms=03; no exceptions are shown.
 But though I have entries where sms count =3 that entry is not retrieved.

 And for queries like SELECT * FROM rule1 WHERE sms=03;
 Bad Request: No indexed columns present in by-columns clause with equals
 operator

 Can anyone recognize the problem here??

 Following are the methods I used.

 //for indexing columns
 void indexColumn(String idxColumnName,String CountercfName){

 Cluster cluster = HFactory.getOrCreateCluster(
 BasicConf.CASSANDRA_CLUSTER, BasicConf.CLUSTER_PORT);
 KeyspaceDefinition keyspaceDefinition =
 cluster.describeKeyspace(BasicConf.KEYSPACE);

 ListColumnFamilyDefinition cdfs = keyspaceDefinition.getCfDefs();
 ColumnFamilyDefinition cfd = null;
 for(ColumnFamilyDefinition c:cdfs){
  if(c.getName().toString().equals(CountercfName)) {
  System.out.println(c.getName());
  cfd=c;
  break;
  }
 }

 BasicColumnFamilyDefinition columnFamilyDefinition = new
 BasicColumnFamilyDefinition(cfd);

 BasicColumnDefinition bcdf = new BasicColumnDefinition();
 bcdf.setName(StringSerializer.get().toByteBuffer(idxColumnName));
 bcdf.setIndexName(idxColumnName+index);
 bcdf.setIndexType(ColumnIndexType.KEYS);
 bcdf.setValidationClass(ComparatorType.COUNTERTYPE.getClassName());

 columnFamilyDefinition.addColumnDefinition(bcdf);
 cluster.updateColumnFamily(new
 ThriftCfDef(columnFamilyDefinition));

  }

 // for adding a new counter column
 void insertCounterColumn(String cfName, String counterColumnName,
  String phoneNumberKey) {

  MutatorString mutator = HFactory.createMutator(keyspace,
  StringSerializer.get());
  mutator.insertCounter(phoneNumberKey, cfName, HFactory
 .createCounterColumn(counterColumnName, 1L,
  StringSerializer.get()));
  mutator.execute();
 CounterQueryString, String counter = new
 ThriftCounterColumnQueryString, String(
  keyspace, StringSerializer.get(), StringSerializer.get());
 counter.setColumnFamily(cfName).setKey(phoneNumberKey)
  .setName(counterColumnName);

  indexColumn(columnName, cfName);

  }

 // incrementing counter values
 void incrementCounter(String ruleName, String columnName,
 HashMapString, Long entries) {

 MutatorString mutator = HFactory.createMutator(keyspace,
 StringSerializer.get());

 SetString keys = entries.keySet();
 for (String s : keys) {
  mutator.incrementCounter(s, ruleName, columnName, entries.get(s));

 }

 mutator.execute();

 }



 On Sun, Jul 29, 2012 at 3:29 PM, Paolo Bernardi berna...@gmail.comwrote:

 On Sun, Jul 29, 2012 at 9:30 AM, Abhijit Chanda
 abhijit.chan...@gmail.com wrote:
  There should be at least one = 

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Greg Fausak
Mina,

Thanks for that post.  Very interesting :-)

What sort of things are you graphing?  Standard *nux stuff
(mem/cpu/etc)?  Or do you
have some hooks in to the C* process (I saw somoething about port 1414
in the .yaml file).

Best,

-g


On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib
mina.nag...@bloomdigital.com wrote:

 Hi Thomas

 On a modern 64bit server, I recommend you pay little attention to the virtual 
 size.  It's made up of almost everything within the process's address space, 
 including on-disk files mmap()ed in for zero-copy access.  It's not 
 unreasonable for a machine with N amount RAM to have a process whose virtual 
 size is several times the value of N.  That in and of itself is not 
 problematic

 In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
 data and index files.  On linux you can invoke the pmap tool on the 
 cassandra process's PID to see what's in there.  Much of it will be anonymous 
 memory allocations (the JVM heap itself, off-heap data structures, etc), but 
 lots of it will be references to files on disk (binaries, libraries, mmap()ed 
 files, etc).

 What's more important to keep an eye on is the JVM heap - typically 
 statically allocated to a fixed size at cassandra startup.  You can get info 
 about its used/capacity values via nodetool -h localhost info.  You can 
 also hook up jconsole and trend it over time.

 The other critical piece is the process's RESident memory size, which 
 includes the JVM heap but also other off-heap data structures and 
 miscellanea.  Cassandra has recently been making more use of off-heap 
 structures (for example, row caching via SerializingCacheProvider).  This is 
 done as a matter of efficiency - a serialized off-heap row is much smaller 
 than a classical object sitting in the JVM heap - so you can do more with 
 less.

 Unfortunately, in my experience, it's not perfect.  They still have a cost, 
 in terms of on-heap usage, as well as off-heap growth over time.

 Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
 caches incurred a very high on-heap cost (ironic) - see my post at 
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
  - as documented in that email, I managed that with regularly scheduled full 
 GC runs via System.gc()

 I have, since then, moved away from scheduled System.gc() to scheduled row 
 cache invalidations.  While this had the same effect as System.gc() I 
 described in my email, it eliminated the 20-30 second pause associated with 
 it.  It did however introduce (or may be I never noticed earlier), slow creep 
 in memory usage outside of the heap.

 It's typical in my case for example for a process configured with 6G of JVM 
 heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
 throughout a week to 10-11GB range.  Depending on what else the box is doing, 
 I've experienced the linux OOM killer killing cassandra as you've described, 
 or heavy swap usage bringing everything down (we're latency-sensitive), etc..

 And now for the good news.  Since I've upgraded to 1.1.2:
 1. There's no more need for regularly scheduled System.gc()
 2. There's no more need for regularly scheduled row cache invalidation
 3. The HEAP usage within the JVM is stable over time
 4. The RESident size of the process appears also stable over time

 Point #4 above is still pending as I only have 3 day graphs since the 
 upgrade, but they show promising results compared to the slope of the same 
 graph before the upgrade to 1.1.2

 So my advice is give 1.1.2 a shot - just be mindful of 
 https://issues.apache.org/jira/browse/CASSANDRA-4411


 On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:

 I saw this.

 All works fine upto version 1.1.0
 the 0.8.x takes 5GB of memory of an 8GB machine
 the 1.0.x takes between 6 and 7 GB on a 8GB machine
 and
 the 1.1.0 takes all

 and it is a problem
 for me it is no solution to wait of the OOM-Killer from the linux kernel
 and restart the cassandraprocess

 when my machine has less then 100MB ram available then I have a problem.



 On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
 Are you actually seeing any problems from this? High virtual memory usage
 on its own really doesn't mean anything. See
 http://wiki.apache.org/cassandra/FAQ#mmap

 On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler 
 thomas.speng...@toptarif.de wrote:

 No one has any idea?

 we tryed

 update to 1.1.2
 DiskAccessMode standard, indexAccessMode standard
 row_cache_size_in_mb: 0
 key_cache_size_in_mb: 0


 Our next try will to change

 SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

 any other proposals are welcom

 On 07/04/2012 02:13 PM, Thomas Spengler wrote:
 Hi @all,

 since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
 of the cassandra-nodes explodes

 our setup is:
 * 5 - centos 5.8 nodes
 * each 4 

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Mina Naguib

All our servers (cassandra and otherwise) get monitored with nagios + get many 
basic metrics graphed by pnp4nagios.  This covers a large chunk of a box's 
health, as well as cassandra basics (specifically the pending tasks, JVM heap 
state).  IMO it's not possible to clearly debug a cassandra issue if you don't 
have a good holistic view of the boxes' health (CPU, RAM, swap, disk 
throughput, etc.)

Separate from that we have an operational dashboard.  It's a bunch of 
manually-defined RRD files and custom scripts that grab metrics, store, and 
graph the health of various layers in the infrastructure in an an 
easy-to-digest way (for example, each data center gets a color scheme - stacked 
machines within multiple DCs can just be eyeballed).  There we can see for 
example our total read volume, total write volume, struggling boxes, dynamic 
endpoint snitch reaction, etc...

Finally, almost all the software we write integrates with statsd + graphite.  
In graphite we have more metrics than we know what to do with, but it's better 
than the other way around.  From there for example we can see cassandra's 
response time including things cassandra itself can't measure (network, thrift, 
etc), across various different client softwares that talk to it.  Within 
graphite we have several dashboards defined (users make their own, some 
infrastructure components have shared dashboards.)


--
Mina Naguib :: Director, Infrastructure Engineering
Bloom Digital Platforms :: T 514.394.7951 #208
http://bloom-hq.com/



On 2012-08-01, at 3:43 PM, Greg Fausak wrote:

 Mina,
 
 Thanks for that post.  Very interesting :-)
 
 What sort of things are you graphing?  Standard *nux stuff
 (mem/cpu/etc)?  Or do you
 have some hooks in to the C* process (I saw somoething about port 1414
 in the .yaml file).
 
 Best,
 
 -g
 
 
 On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib
 mina.nag...@bloomdigital.com wrote:
 
 Hi Thomas
 
 On a modern 64bit server, I recommend you pay little attention to the 
 virtual size.  It's made up of almost everything within the process's 
 address space, including on-disk files mmap()ed in for zero-copy access.  
 It's not unreasonable for a machine with N amount RAM to have a process 
 whose virtual size is several times the value of N.  That in and of itself 
 is not problematic
 
 In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
 data and index files.  On linux you can invoke the pmap tool on the 
 cassandra process's PID to see what's in there.  Much of it will be 
 anonymous memory allocations (the JVM heap itself, off-heap data structures, 
 etc), but lots of it will be references to files on disk (binaries, 
 libraries, mmap()ed files, etc).
 
 What's more important to keep an eye on is the JVM heap - typically 
 statically allocated to a fixed size at cassandra startup.  You can get info 
 about its used/capacity values via nodetool -h localhost info.  You can 
 also hook up jconsole and trend it over time.
 
 The other critical piece is the process's RESident memory size, which 
 includes the JVM heap but also other off-heap data structures and 
 miscellanea.  Cassandra has recently been making more use of off-heap 
 structures (for example, row caching via SerializingCacheProvider).  This is 
 done as a matter of efficiency - a serialized off-heap row is much smaller 
 than a classical object sitting in the JVM heap - so you can do more with 
 less.
 
 Unfortunately, in my experience, it's not perfect.  They still have a cost, 
 in terms of on-heap usage, as well as off-heap growth over time.
 
 Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
 caches incurred a very high on-heap cost (ironic) - see my post at 
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
  - as documented in that email, I managed that with regularly scheduled full 
 GC runs via System.gc()
 
 I have, since then, moved away from scheduled System.gc() to scheduled row 
 cache invalidations.  While this had the same effect as System.gc() I 
 described in my email, it eliminated the 20-30 second pause associated with 
 it.  It did however introduce (or may be I never noticed earlier), slow 
 creep in memory usage outside of the heap.
 
 It's typical in my case for example for a process configured with 6G of JVM 
 heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up 
 slowly throughout a week to 10-11GB range.  Depending on what else the box 
 is doing, I've experienced the linux OOM killer killing cassandra as you've 
 described, or heavy swap usage bringing everything down (we're 
 latency-sensitive), etc..
 
 And now for the good news.  Since I've upgraded to 1.1.2:
1. There's no more need for regularly scheduled System.gc()
2. There's no more need for regularly scheduled row cache invalidation
3. The HEAP usage within the JVM is stable over time

Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Ivan Jiang
Hi Greg,

Thank you for your answers.

I should have to convert my mind to NoSql from RD-SQL while using Cassandra.

Best Regards,
Ivan



On Wed, Aug 1, 2012 at 9:20 PM, Greg Fausak g...@named.com wrote:

 Hi Ivan,

 No Cassandra does not support transactions.

 I believe each operation is atomic.  If that operation returns
 a successful result, then it worked.  You can't do things like
 bind two operations and guarantee is either fails they both fail.

 You will find that Cassandra doesn't do a lot of things compared to a sql
 db :-)

 But, it does write a lot of data quickly.

 -g


 On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang wiwi1...@gmail.com wrote:
  Hi,
  I am a new guy to Cassandra, I wonder if available to call Cassandra
 in
  one Transaction such as in Relation-DB.
 
  Thanks in advance.
 
  Best Regards,
  Ivan Jiang



Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Roshni Rajagopal
Hi Ivan,

Cassandra supports 'tunable consistency' . If you always read and write at a 
quorum (or local quorum for multi data center) from one , you can guarantee 
that the results will be consistent as in all the data will be compared and the 
latest will be returned, and no data will be out of date. This is at a loss of 
performance- it will be fastest to just read and write once rather than check a 
quorum of nodes.

What you chose depends on what your application needs are. Is it ok if some 
users receive out of date data (it isn't earth shattering if someone doesn't 
know what you're eating right now), or is it a banking transaction system where 
all entities must be consistently updated.

So designing in cassandra priortizes de-normalization. You cannot have 
referential integrity that 2 tables (col families in cassandra) are in sync 
because the database has designed it to be so using foreign keys. The 
application needs to ensure that all data in column families are accurate and 
not out of sync, because data elements may be duplicated in different col 
families.


You cannot have 2 different entities and ensure that changes to both will be 
done and then only be visible to others.


Regards,


From: Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Does Cassandra support operations in a transaction?

Short story is that few if any of the NoSql systems supprot transactions 
natively. Thats oen of the big compromises they make.  What they call eventual 
consistancy is actually eventual Durabiltiy in ACID terms.

Consistancy, as meant by the C in ACID,  is not gauranteed at all.

On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang 
wiwi1...@gmail.commailto:wiwi1...@gmail.com wrote:
Hi,
I am a new guy to Cassandra, I wonder if available to call Cassandra in one 
Transaction such as in Relation-DB.

Thanks in advance.

Best Regards,
Ivan Jiang



--
It's always darkest just before you are eaten by a grue.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: Looking for a good Ruby client

2012-08-01 Thread Thorsten von Eicken
Harry, we're in a similar situation and are starting to work out our own
ruby client. The biggest issue is that it doesn't make much sense to
build a higher level abstraction on anything other than CQL3, given
where things are headed. At least this is our opinion.
At the same time, CQL3 is just barely becoming usable and still seems
rather deficient in wide-row usage. The tricky part is that with the
current CQL3 you have to construct quite complex iterators to retrieve a
large result set. Which means that you end up having to either parse
CQL3 coming in to insert the iteration stuff, or you have to pass CQL3
fragments in and compose them together with iterator clauses. Not fun
stuff either way.
The only good solution I see is to switch to a streaming protocol (or
build some form of continue on top of thrift) such that the client can
ask for a huge result set and the cassandra coordinator can break it
into sub-queries as it sees fit and return results chunk-by-chunk. If
this is really the path forward then all abstractions built above CQL3
before that will either have a good piece of complex code that can be
deleted or worse, will have an interface that is no longer best practice.
Good luck!
Thorsten


On 8/1/2012 1:47 PM, Harry Wilkinson wrote:
 Hi,

 I'm looking for a Ruby client for Cassandra that is pretty high-level.
  I am really hoping to find a Ruby gem of high quality that allows a
 developer to create models like you would with ActiveModel.

 So far I have figured out that the canonical Ruby client for Cassandra
 is Twitter's Cassandra gem https://github.com/twitter/cassandra/ of
 the same name.  It looks great - mature, still in active development,
 etc.  No stated support for Ruby 1.9.3 that I can see, but I can
 probably live with that for now.

 What I'm looking for is a higher-level gem built on that gem that
 works like ActiveModel in that you just include a module in your model
 class and that gives you methods to declare your model's serialized
 attributes and also the usual ActiveModel methods like 'save!',
 'valid?', 'find', etc.

 I've been trying out some different NoSQL databases recently, and for
 example there is an official Ruby client
 https://github.com/basho/riak-ruby-client for Riak with a domain
 model that is close to Riak's, but then there's also a gem called
 'Ripple' https://github.com/seancribbs/ripple that uses a domain
 model that is closer to what most Ruby developers are used to.  So it
 looks like Twitter's Cassandra gem is the one that stays close to the
 domain model of Cassandra, and what I'm looking for is a gem that's a
 Cassandra equivalent of RIpple.

 From some searching I found cassandra_object
 https://github.com/NZKoz/cassandra_object, which has been inactive
 for a couple of years, but there's a fork
 https://github.com/data-axle/cassandra_object that looks like it's
 being maintained, but I have not found any kind of information to
 suggest the maintained fork is in general use yet.  I have found quite
 a lot of gems of a similar style that people have started and then not
 really got very far with.

 So, does anybody know of a suitable gem?  Would you recommend it?  Or
 perhaps you would recommend not using such a gem and sticking with the
 lower-level client gem?

 Thanks in advance for your advice.

 Harry




Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Jeffrey Kesselman
Roshni,

Thats not what consistancy in ACID means.  Its not consistancy of reading
the ame data, its referntial integrity between related pecies of data.

Consistency
Data is in a consistent state when a transaction starts and when it ends. For
example, in an application that transfers funds from one account to
another, the consistency property ensures that the total value of funds in
both the accounts is the same at the start and end of each transaction. 
http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html

A lot of people i nthe NoSql wqorld use the term consistancy when what
they mean is durability.

 Durability After a transaction successfully completes, changes to data
persist and are not undone, even in the event of a system failure. 

Many NoSql databses (includiogn Cassandra) are eventuallydurable, in the
sense that a read immediately after a write may noit reflect that write,
but at soem l;ater point, it will.

None p[rovide true consistancy that I am aware of.



:

On Thu, Aug 2, 2012 at 12:24 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com wrote:

 Hi Ivan,

 Cassandra supports 'tunable consistency' . If you always read and write at
 a quorum (or local quorum for multi data center) from one , you can
 guarantee that the results will be consistent as in all the data will be
 compared and the latest will be returned, and no data will be out of date.
 This is at a loss of performance- it will be fastest to just read and write
 once rather than check a quorum of nodes.

 What you chose depends on what your application needs are. Is it ok if
 some users receive out of date data (it isn't earth shattering if someone
 doesn't know what you're eating right now), or is it a banking transaction
 system where all entities must be consistently updated.

 So designing in cassandra priortizes de-normalization. You cannot have
 referential integrity that 2 tables (col families in cassandra) are in sync
 because the database has designed it to be so using foreign keys. The
 application needs to ensure that all data in column families are accurate
 and not out of sync, because data elements may be duplicated in different
 col families.


 You cannot have 2 different entities and ensure that changes to both will
 be done and then only be visible to others.


 Regards,


 From: Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Does Cassandra support operations in a transaction?

 Short story is that few if any of the NoSql systems supprot transactions
 natively. Thats oen of the big compromises they make.  What they call
 eventual consistancy is actually eventual Durabiltiy in ACID terms.

 Consistancy, as meant by the C in ACID,  is not gauranteed at all.

 On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang wiwi1...@gmail.commailto:
 wiwi1...@gmail.com wrote:
 Hi,
 I am a new guy to Cassandra, I wonder if available to call Cassandra
 in one Transaction such as in Relation-DB.

 Thanks in advance.

 Best Regards,
 Ivan Jiang



 --
 It's always darkest just before you are eaten by a grue.

 This email and any files transmitted with it are confidential and intended
 solely for the individual or entity to whom they are addressed. If you have
 received this email in error destroy it immediately. *** Walmart
 Confidential ***




-- 
It's always darkest just before you are eaten by a grue.


Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Jeffrey Kesselman
True consistancy, btw,  pretty much is only possible in a transactional
environment.

On Thu, Aug 2, 2012 at 12:56 AM, Jeffrey Kesselman jef...@gmail.com wrote:

 Roshni,

 Thats not what consistancy in ACID means.  Its not consistancy of reading
 the ame data, its referntial integrity between related pecies of data.

 Consistency
 Data is in a consistent state when a transaction starts and when it ends. For
 example, in an application that transfers funds from one account to
 another, the consistency property ensures that the total value of funds in
 both the accounts is the same at the start and end of each transaction. 

 http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html

 A lot of people i nthe NoSql wqorld use the term consistancy when what
 they mean is durability.

  Durability After a transaction successfully completes, changes to data
 persist and are not undone, even in the event of a system failure. 

 Many NoSql databses (includiogn Cassandra) are eventuallydurable, in the
 sense that a read immediately after a write may noit reflect that write,
 but at soem l;ater point, it will.

 None p[rovide true consistancy that I am aware of.



 :

 On Thu, Aug 2, 2012 at 12:24 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.com wrote:

 Hi Ivan,

 Cassandra supports 'tunable consistency' . If you always read and write
 at a quorum (or local quorum for multi data center) from one , you can
 guarantee that the results will be consistent as in all the data will be
 compared and the latest will be returned, and no data will be out of date.
 This is at a loss of performance- it will be fastest to just read and write
 once rather than check a quorum of nodes.

 What you chose depends on what your application needs are. Is it ok if
 some users receive out of date data (it isn't earth shattering if someone
 doesn't know what you're eating right now), or is it a banking transaction
 system where all entities must be consistently updated.

 So designing in cassandra priortizes de-normalization. You cannot have
 referential integrity that 2 tables (col families in cassandra) are in sync
 because the database has designed it to be so using foreign keys. The
 application needs to ensure that all data in column families are accurate
 and not out of sync, because data elements may be duplicated in different
 col families.


 You cannot have 2 different entities and ensure that changes to both will
 be done and then only be visible to others.


 Regards,


 From: Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Does Cassandra support operations in a transaction?

 Short story is that few if any of the NoSql systems supprot transactions
 natively. Thats oen of the big compromises they make.  What they call
 eventual consistancy is actually eventual Durabiltiy in ACID terms.

 Consistancy, as meant by the C in ACID,  is not gauranteed at all.

 On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang wiwi1...@gmail.commailto:
 wiwi1...@gmail.com wrote:
 Hi,
 I am a new guy to Cassandra, I wonder if available to call Cassandra
 in one Transaction such as in Relation-DB.

 Thanks in advance.

 Best Regards,
 Ivan Jiang



 --
 It's always darkest just before you are eaten by a grue.

 This email and any files transmitted with it are confidential and
 intended solely for the individual or entity to whom they are addressed. If
 you have received this email in error destroy it immediately. *** Walmart
 Confidential ***




 --
 It's always darkest just before you are eaten by a grue.




-- 
It's always darkest just before you are eaten by a grue.