Continuously increasing RAM usage

2010-05-27 Thread James Golick
We're seeing RAM usage continually climb until eventually, cassandra becomes unresponsive. The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am assuming that the memory usage is related to mmap'd IO. Fair assumption? I tried setting the IO mode to standard, but it seemed to

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-27 Thread gabriele renzi
On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: What is the use case? we end up with messed up data in the database, we run a mapreduce job to find irregular data from time to time. Why are you using Cassandra versus using data stored in HDFS or HBase? as of

Re: using more than 50% of disk space

2010-05-27 Thread gabriele renzi
On Wed, May 26, 2010 at 8:00 PM, Sean Bridges sean.brid...@gmail.com wrote: So after CASSANDRA-579, anti compaction won't be done on the source node, and we can use more than 50% of the disk space if we use multiple column families? Sorry if I misunderstand, but #579 seems to only solve half

Batch_Mutate throws Uncaught exception

2010-05-27 Thread Moses Dinakaran
Hi, I am trying to use batch_mutate() with PHP Thrift. I was getting the following error. *Fatal error*: Uncaught exception 'cassandra_InvalidRequestException' in CORE/php/phpcassa/thrift/packages/cassandra/Cassandra.php:4869 Stack trace: #0

Re: Batch_Mutate throws Uncaught exception

2010-05-27 Thread Mishail
Hi, Just to clarify. Are you trying to insert a couple of columns with key cache_pages in the ColumnFamily Page? Moses Dinakaran wrote: i, I am trying to use batch_mutate() with PHP Thrift. I was getting the following error.

Re: Continuously increasing RAM usage

2010-05-27 Thread Ian Soboroff
A lot of folks have reported this issue, and there are a few JIRAs related to it. Post the output of nodetool tpstats. Also, are there lots of GCs in the system.log? If so, are they something besides ParNew? Ian On Thu, May 27, 2010 at 2:32 AM, James Golick jamesgol...@gmail.com wrote:

Remove and BytesType

2010-05-27 Thread Bill de hOra
Saw some behaviour today on Cassandra 0.6.1 - After running a remove command on a row in a CF whose CompareWith was BytesType the row was still there, and still there after bouncing the server. This was the case for hector/cli. When I changed the CompareWith to UTF8Type, new rows added could

Re: Remove and BytesType

2010-05-27 Thread Philip Stanhope
Could you clarify what you mean by remove command? Remove all columns leaving a row key? Did you use nodetool to force a flush and then compact after GCGraceSeconds? On May 27, 2010, at 9:27 AM, Bill de hOra wrote: Saw some behaviour today on Cassandra 0.6.1 - After running a remove

Re: Cassandra's 2GB row limit and indexing

2010-05-27 Thread Jonathan Ellis
Yes, #16 (which is almost done for 0.7) will make this possible. On Wed, May 26, 2010 at 7:52 PM, Richard West r...@clearchaos.com wrote: Hi all, I'm currently looking at new database options for a URL shortener in order to scale well with increased traffic as we add new features. Cassandra

Re: Batch_Mutate throws Uncaught exception

2010-05-27 Thread Jonathan Ellis
you need to pull out the exception why field, which explains what was invalid about the request On Thu, May 27, 2010 at 2:45 AM, Moses Dinakaran mosesdinaka...@gmail.com wrote: Hi, I am trying to use batch_mutate() with PHP Thrift. I was getting the following error. Fatal error: 

Re: Remove and BytesType

2010-05-27 Thread Jonathan Ellis
remove to a full row doesn't touch comparewith at all. I think that's a red herring. More likely data in that row was created with a higher-res timestamp than the delete was issued at. On Thu, May 27, 2010 at 7:27 AM, Bill de hOra b...@dehora.net wrote: Saw some behaviour today on Cassandra

Re: Continuously increasing RAM usage

2010-05-27 Thread James Golick
Just upgraded to Sun JVM 1.6.0_20 and cassandra 0.6.2. Will report back when I have data. On Thu, May 27, 2010 at 9:39 AM, Philip Stanhope pstanh...@wimba.comwrote: I've seen numerous anecdotal references that the Sun JVM performs better. Is there a reason why the debian packaging for

Re: GMFD messages

2010-05-27 Thread Anthony Molinaro
On Thu, May 27, 2010 at 08:04:18AM -0600, Jonathan Ellis wrote: This is a relic of when Gossip was over UDP and had to worry about packet size. I created https://issues.apache.org/jira/browse/CASSANDRA-1138 to remove those notifications. Ahh, okay, well its odd that a limit was set even

Hector client usage

2010-05-27 Thread Atul Gosain
Hi Im trying to use Hector client to insert and then read the data from cassandra. While im able to write the data and able to see that thru cassandra-client cli, im not able to read that from the program. Getting following error. What am in doing wrong in my program. Can someone help me here

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-27 Thread Jeremy Hanna
Is there anything holding you back from using it (if you would like to use it but currently cannot)? It would be nice if the output of the mapreduce job was a MutationOutputFormat in which we could write insert/delete, but I recall there is something on jira already albeit not sure if it

Re: Continuously increasing RAM usage

2010-05-27 Thread Kyusik Chung
Hi Philip, I think they chose to go with OpenJDK bc Sun's is not open source. Here's what we did on ubuntu 10.04 (if youre using a different debian distro, you can prob do something very similar): # this install gives us the convenient add-apt-repository command sudo apt-get install

Re: Remove and BytesType

2010-05-27 Thread Bill de hOra
More likely data in that row was created with a higher-res timestamp than the delete was issued at. Indeed - the problem was nanos v millis with a bit of clock skew thrown in :) Bill Jonathan Ellis wrote: remove to a full row doesn't touch comparewith at all. I think that's a red

Re: GMFD messages

2010-05-27 Thread Jonathan Ellis
Yes, Gossip goes through MD too. On Thu, May 27, 2010 at 11:03 AM, Anthony Molinaro antho...@alumni.caltech.edu wrote: On Thu, May 27, 2010 at 08:04:18AM -0600, Jonathan Ellis wrote: This is a relic of when Gossip was over UDP and had to worry about packet size.  I created

Re: Hector client usage

2010-05-27 Thread Jonathan Ellis
UnavailableException means all the nodes that should have this data, are down. On Thu, May 27, 2010 at 12:01 PM, Atul Gosain atul.gos...@gmail.com wrote: Forgot to attach the class . On Thu, May 27, 2010 at 11:17 PM, Atul Gosain atul.gos...@gmail.com wrote: Hi   Im trying to use Hector

Re: Large column/row inserts

2010-05-27 Thread Jonathan Ellis
JVM GC pause? If so the improved JVM options in 0.6.2 should help some. Increasing heap size is also a good candidate to help. On Thu, May 27, 2010 at 2:01 PM, Jones, Nick nick.jo...@amd.com wrote: Hi everyone, I'm using the Cassandra gem and have been trying to optimize inserting 400k-1M

Cassandra CF sharding

2010-05-27 Thread Maxim Kramarenko
Hello! We have mail archive with one large CF for mail body. In our case, it's easy to shard data to 5-10 CF by customer id. We like to do this because: 1) We get more manageable instances, because we have many small CF instead of one multi-TB CF on each node. 2) Better disk space usage

Re: cluster locks up from high MESSAGE-DESERIALIZER-POOL counts

2010-05-27 Thread Cagatay Kavukcuoglu
I think this is because as an optimization Cassandra sends a read request only to the closest replica and sends digest requests to other replicas for read repair. The same replica is probably getting chosen as the closest for all of your read requests. Maybe it would be a useful improvement to

Re: using more than 50% of disk space

2010-05-27 Thread gabriele renzi
On Thu, May 27, 2010 at 9:23 PM, Sean Bridges sean.brid...@gmail.com wrote: But doesn't having multiple similarly sized column families mean in-node compaction does not require 50% of disk?  Looking at compaction manager, only 1 thread is doing a compaction, so we only need enough free disk

Re: cluster locks up from high MESSAGE-DESERIALIZER-POOL counts

2010-05-27 Thread Edmond Lau
If this description is accurate, then it sounds like my only available workaround would be to not use multiget() and instead issue multiple get() calls to random nodes so that I can hit the other replicas. Edmond On Thu, May 27, 2010 at 2:36 PM, Cagatay Kavukcuoglu caga...@kavukcuoglu.org

Re: Questions regarding batch mutates and transactions

2010-05-27 Thread Todd Nine
Correct Ran. It seems like the only way I'm going to get true mutations in a single op is to use Cages. Thankfully a majority of our application won't require it, just a few specialized components. On Wed, 2010-05-26 at 12:57 +0300, Ran Tavory wrote: The summary of your question is: is

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Steve Lihn
Mongo has it too. It could save a lot of development time if one can figure out porting Mongo's query API and stored javascript to Cassandra. It would be great if scala's list comprehension can be facilitated to write query-like code against Cassandra schema. On Thu, May 27, 2010 at 11:05 AM,

Re: Cassandra-0.6.1 Crash Error: out of memory

2010-05-27 Thread Peng Guo
Thanks for your help :) On Thu, May 27, 2010 at 10:38 PM, Jonathan Ellis jbel...@gmail.com wrote: It looks like you simply don't have a large enough heap for all the in-flight data. Low-hanging fruit includes - upgrade to 0.6.2 (available from

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Jake Luciani
I've secretly started working on this but nothing to show yet :( I'm calling it SliceDiceReduce or SliceReduce. The plan is to use the js thrift bindings I've added for 0.3 release of thrift (out very soon?) This will allow the supplied js to access the results like any other thrift

Re: Thoughts on adding complex queries to Cassandra

2010-05-27 Thread Jeremy Davis
I agree, I had more than filter results in mind. Though I had envisioned the results to continue to use the ListColumnOrSuperColumn (and not JSON). You could still create new result columns that do not in any way exist in Cassandra, and you could still stuff JSON in to any of result columns. I

ec2 tests

2010-05-27 Thread Chris Dean
I'm interested in performing some simple performance tests on EC2. I was thinking of using py_stress and Cassandra deployed on 3 servers with one separate machine to run py_stress. Are there any particular configuration settings I should use? I was planning on changing the JVM heap size to

Re: Cassandra CF sharding

2010-05-27 Thread Jonathan Ellis
2) is correct, but for 1) I'm not sure what manageability improvements you anticipate from dealing with multiple entities instead of one. I'm not sure what you're thinking of for 3) but routing is done by key only. 2010/5/27 Maxim Kramarenko maxi...@trackstudio.com: Hello! We have mail archive