Re: node restart taking too long

2011-08-16 Thread Teijo Holzer
Hi, yes, we saw exactly the same messages. We got rid of these by doing the following: * Set all row key caches in your CFs to 0 via cassandra-cli * Kill Cassandra * Remove all files in the saved_caches directory * Start Cassandra * Slowly bring back row key caches (if desired, we left them

Re: Scalability question

2011-08-16 Thread Teijo Holzer
with subcomparator = BytesType and min_compaction_threshold=2 and read_repair_chance=0 and keys_cached = 20 and rows_cached = 50 and default_validation_class = CounterColumnType and replicate_on_write=true; Philippe 2011/8/16 Teijo Holzer thol...@wetafx.co.nz mailto:thol...@wetafx.co.nz Hi

Re: Scalability question

2011-08-15 Thread Teijo Holzer
Hi, we have come across this as well. We run continuously run rolling repairs followed by major compactions followed by a gc() (or node restart) to get rid of all these sstables files. Combined with aggressive ttls on most inserts, the cluster stays nice and lean. You don't want your

Re: Unable to repair a node

2011-08-14 Thread Teijo Holzer
Hi, I took the following steps to get a node that refused to repair back under control. WARNING: This resulted in some data loss for us, YMMV with your replication factor * Turn off all row key caches via cassandra-cli * Set disk_access_mode: standard in cassandra.yaml * Kill Cassandra on

Re: Unable to repair a node

2011-08-14 Thread Teijo Holzer
to bootstrap Cheers, T. On 15/08/11 09:16, Teijo Holzer wrote: Hi, I took the following steps to get a node that refused to repair back under control. WARNING: This resulted in some data loss for us, YMMV with your replication factor * Turn off all row key caches via cassandra-cli * Set

SOLVED: Bind the JMX port to a specific IP address/interface

2011-08-08 Thread Teijo Holzer
Hi, I was following this blog about running multiple nodes on the same host: http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.html The main nuisance here was that there is no easy way to specify the host IP address for JMX to bind to, it would always bind to all

Re: Last manual repair time

2011-08-04 Thread Teijo Holzer
That's simple, set your log level to INFO in log4j-server.properties and then do the following: Start of every repair: grep 'Waiting for repair requests:' system.log End of every repair: grep 'No neighbors to repair with' system.log If you perform individual repairs for each keyspace/column

Force a garbage collection with jmxterm from the shell

2011-08-04 Thread Teijo Holzer
Hi, The following command line triggers a garbage collection via JMX: echo 'run -b java.lang:type=Memory gc' | java -jar jmxterm-1.0-alpha-4-uber.jar -l service:jmx:rmi:///jndi/rmi://hostname:8080/jmxrmi -n It uses: http://wiki.cyclopsgroup.org/jmxterm The GC is necessary after a major

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Hi, try running a major compaction via nodetool on this Column family. The number of SSTables seems quite large. Considering the space used, this might take a few hours and might also impact performance. Cheers, T. On 01/08/11 14:23, myreasoner wrote: Hi, my read latency is

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Compaction is machine-local, you need to run it on every node. Do it as a rolling compaction (or in parallel if you can take the performance hit). Cheers, T. On 01/08/11 15:31, myreasoner wrote: If I do ./nodetool -h localhost compact keyspace columnfamily1 it will go out and

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Hi, try nodetool -h localhost compact check progress with nodetool -h localhost compactionstats and check system.log Cheers, T. On 01/08/11 15:47, myreasoner wrote: Thanks. I did *./nodetool -h localhost compact keyspace columnfamily1 *. But it came back really quick and the

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Looks like a broken node, just restart Cassandra on that node. Might want to wait for the compaction to finish on the other nodes. Also, don't forget to JMX gc() manually after the compaction has finished to delete the files on each node. On 01/08/11 16:29, myreasoner wrote: On the node

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Teijo Holzer
Hi, yes I was looking for this config as well. This is really simple to achieve: Put the following line into /etc/security/limits.conf cassandra- memlock 32 Then, start Cassandra as the user cassandra, not as root (note there is never a need to run Cassandra as root,

Recovering from a multi-node cluster failure caused by OOM on repairs

2011-07-26 Thread Teijo Holzer
Hi, I thought I share the following with this mailing list as a number of other users seem to have had similar problems. We have the following set-up: OS: CentOS 5.5 RAM: 16GB JVM heap size: 8GB (also tested with 14GB) Cassandra version: 0.7.6-2 (also tested with 0.7.7) Oracle JDK version: