Re: zookeeper, how do you feed the pets?

2010-05-17 Thread Patrick Hunt
Hi, ZK uses a quorum protocol (similar but not the same as paxos) for writes, as a result it's sensitive to inter-server latency. (however reads are always local and therefore not effected) Running a cluster fully w/in a colo you can achieve 15k writes/second, with a cluster distributed across

Re: list of columns

2010-05-17 Thread Bill de hOra
Agree with David, it's not there and thinking about how the data is laid out on disk, it can't be done without changing core code or harming something else. if this is a performance concern It's not, it was to supply an administrative function on SuperColumns, but it would be good to not

RE: Nodes Levels of Hierarchy in Cassandra.

2010-05-17 Thread xmanach.ext
Hi Benjamin. Ok. Thank's for your anwser. -Message d'origine- De : Benjamin Black [mailto:b...@b3k.us] Envoyé : lundi 17 mai 2010 00:08 À : user@cassandra.apache.org Objet : Re: Nodes Levels of Hierarchy in Cassandra. Not in Cassandra. Your description of the levels is not quite

Re: Load Balancing Mapper Tasks

2010-05-17 Thread Joost Ouwerkerk
At any given moment at least half of those threads are in the following state; what does it represent? Name: ROW-READ-STAGE:6 State: WAITING on java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@fea6030 Total blocked: 44 Total waited: 479 Stack trace:

Re: Several CFs and partitioning : which key rabge is used

2010-05-17 Thread Jonathan Ellis
There is only one partitioner, and that alone is what determines key - token mapping. CF has nothing to do with it. On Mon, May 17, 2010 at 4:55 AM, Miriam Allalouf miriam.allal...@gmail.com wrote: Hi, I have a basic question regarding key ranges and partitions. Assuming we have two CF column

Re: Avro Example Code

2010-05-17 Thread Eric Evans
On Fri, 2010-05-14 at 14:52 -0600, David Wellman wrote: Does anyone have a good link or example code that we can use to spike on Avro with Cassandra? If you're using Python, the best place to look is the functional tests (see test/system), otherwise, Patrick's quick start

Re: Overfull node

2010-05-17 Thread Anthony Molinaro
I had this happen when I changed the seed node in a running cluster, and then started and stopped various nodes. I fixed it by restarting the seed node(s) (and waiting for it to be fully up), then restarting all the other nodes. -Anthony On Fri, May 14, 2010 at 05:11:40PM -0700, David Koblas

Re: Avro Example Code

2010-05-17 Thread Wellman, David
I spent the weekend working with avro and some java junit tests. I still have a lot of learning to do, but if others would like to use, add to or improve upon the tests then I would appricate the feedback and help. David Wellman On May 17, 2010, at 10:16 AM, Eric Evans

nodetool causing OOM?

2010-05-17 Thread Ronald Park
Hello, We are getting our feet wet with Cassandra and have a test environment set up to do some heavy data insertion. [Heavy is relative: we are talking about 1M inserts in a 3 hours test. Twice while running these tests, when we've tried to use 'nodetool' about an hour or so into the test,

Re: Hadoop over Cassandra

2010-05-17 Thread Jonathan Ellis
Moving to the user@ list. http://wiki.apache.org/cassandra/HadoopSupport should be useful. On Mon, May 17, 2010 at 2:41 PM, Yan Virin jan.vi...@gmail.com wrote: Hi, Can someone explain how this works? As long as I know, there is no execution engine in Cassandra alone, so I assume that Hadoop

Re: cassandra cluster (2 node): one node is OK, another has this error: java.lang.AssertionError: invalid response count 2

2010-05-17 Thread Jonathan Ellis
do both nodes see each other in nodetool ring? On Fri, May 14, 2010 at 4:47 PM, li wei liwei...@yahoo.com wrote: Hi, Guys, I have a lasted cassandra -0.6.0.-rc1. Connect one node1 from java. Node 1 is OK, node 2 has this error: (if Connect one node2 from java. Node 2 is OK, node 1 has same

Re: nodetool causing OOM?

2010-05-17 Thread Ronald Park
Brandon Williams wrote: On Mon, May 17, 2010 at 2:44 PM, Ronald Park ronald.p...@cbs.com mailto:ronald.p...@cbs.com wrote: Hello, We are getting our feet wet with Cassandra and have a test environment set up to do some heavy data insertion. [Heavy is relative: we are talking

Re: Cassandra training on May 21 in Palo Alto

2010-05-17 Thread S Ahmed
Jonathan, Curious how many people have signed up? I hope you will do another one soon! On Tue, May 11, 2010 at 12:42 PM, Vick Khera vi...@khera.org wrote: On Fri, May 7, 2010 at 6:56 AM, Matt Revelle mreve...@gmail.com wrote: Reston, VA is a good spot in the DC metro area for tech events.

JMX metrics for monitoring

2010-05-17 Thread Maxim Kramarenko
Hi! Which JMX metrics do you use for Cassandra monitoring ? Which values can be used for alerts ?

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Mark Greene
Can you provide us with the current JVM args? Also, what type of work load you are giving the ring (op/s)? On Mon, May 17, 2010 at 6:39 PM, Curt Bererton c...@zipzapplay.com wrote: Hello Cassandra users+experts, Hopefully someone will be able to point me in the correct direction. We have

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Brandon Williams
On Mon, May 17, 2010 at 6:02 PM, Curt Bererton c...@zipzapplay.com wrote: So pretty much the defaults aside from the 7Gig max heap. CPU is totally hammered right now, and it is receiving 0 ops/sec from me since I disconnected it from our application right now until I can figure out what's

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Lee Parker
What are your storage-conf settings for Memtable thresholds? One thing that could cause lots of CPU usage is dumping the memtables too frequently and then having to do lots of compaction. With that much available heap space you could definitely go larger than the default thresholds. Also, do

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Lee Parker
Also, I am using batch_mutate for all of my writes. Lee Parker On Mon, May 17, 2010 at 7:11 PM, Lee Parker l...@socialagency.com wrote: What are your storage-conf settings for Memtable thresholds? One thing that could cause lots of CPU usage is dumping the memtables too frequently and then

Re: Hadoop over Cassandra

2010-05-17 Thread Jonathan Ellis
On Mon, May 17, 2010 at 4:12 PM, Vick Khera vi...@khera.org wrote: On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis jbel...@gmail.com wrote: Moving to the user@ list. http://wiki.apache.org/cassandra/HadoopSupport should be useful. That document doesn't really answer the is data locality

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Curt Bererton
Thanks for the help guys: First answering the first question: both cores are pegged: Cpu0 : 43.8%us, 34.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 22.1%st Cpu1 : 40.5%us, 36.2%sy, 0.0%ni, 0.4%id, 0.0%wa, 0.0%hi, 0.2%si, 22.6%st Mem: 7872040k total, 3620180k used, 4251860k

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

2010-05-17 Thread Curt Bererton
Agreed, and I just saw that in storage conf that a higher value for the MemtableFlushAfterMinutes is suggested otherwise you might get a flush storm: of all your memtables flushing at once. I've changed that as well. -- Curt, ZipZapPlay Inc., www.PlayCrafter.com,

IO errors after upgrading from 0.5.1 to 0.6

2010-05-17 Thread Stephen Hamer
After upgrading my cluster from 0.5.1 to the 0.6 branch (commit 1206bcf in git). I am seeing lots of IO errors in the log output. Two questions: 1. Is this a sign that I have corrupt data? Is there some way for me to recover it or at the very least remove the bad data? 2. If this is an

Re: JMX metrics for monitoring

2010-05-17 Thread Ran Tavory
There are many, but here's what I found useful so far: Per CF you have: - Recent read/write latency - PendingTasks - Read/Write count Globally you have, for each of the stages (e.g. org.apache.cassandra.concurrent:type=ROW-READ-STAGE): - PendingTasks - ActiveCount ... and as you go you'll find

Re: IO errors after upgrading from 0.5.1 to 0.6

2010-05-17 Thread Stephen Hamer
I found out what was wrong. The schema file had gotten changed but not deployed to the cluster recently. During the migration the new schema was used. A column family got switched from a normal column family to a super column family. Stephen Hamer On Mon, May 17, 2010 at 6:16 PM, Stephen Hamer

Multiple hard disks configuration

2010-05-17 Thread Ma Xiao
Hi all, Recently we have a 5 nodes running cassandra, 4 X 1.5TB drives for each, I installed os(Ubuntu 9.10 Server Edition) on one of them, and make entrie disk as 1 partition for others, then I put 4 paths with DataFileDirectory, my question is what's going to happen when one of the disk

Disk usage doubled after nodetool compact

2010-05-17 Thread Arie Keren
After performing nodetool compact command the disk usage was doubled. nodetool info reports that the load is 74G (same as before compaction) while the size of the data folder on disk is 133GB (was about 74G before compaction).

Data migration from mysql to cassandra

2010-05-17 Thread Beier Cai
I'm currently moving my existing mysql database to cassandra. One particular problem I have is to migrate all those integer auto-increment ids to cassandra's code generated keys (like UUID). One way I can do is to dump all the existing records into Cassandra and start with UUID for new records,