Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test

2010-06-02 Thread Shuai Yuan
Still seems MEM. However it's hard to convince that constantly writing(even great amount of data) needs so much MEM(16GB). The process is quite simple, input_data - memtable - flush to disk right? What does cassandra need so much MEM for? Thanks! ?? 2010-06-02 16:24 +0800??lwl?? No.

Re: [***SPAM*** ] Re: writing speed test

2010-06-02 Thread Shuai Yuan
Thanks Peter! In my test application, for each record, rowkey - rand() * 4, about 64B column * 20 - rand() * 20, about 320B I use batch_insert(rowkey, col*20) in thrift. Kevin Yuan ??: Peter Sch??ller sc...@spotify.com ??: user@cassandra.apache.org :

Re: Range search on keys not working?

2010-06-02 Thread Torsten Curdt
Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn da...@lookin2.com wrote: Range search on keys is not working for me. I was assured in earlier threads that range search would work, but the results would not be ordered. I'm trying to

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
The previous thread where we discussed this is called, key is sorted? On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn da...@lookin2.com wrote: I'm not using OPP. But I was assured on earlier threads (I asked several times to be sure) that it would work as stated below: the results would not

RE: Range search on keys not working?

2010-06-02 Thread Dr . Martin Grabmüller
When not using OOP, you should not use something like 'CATEGORY/' as the end key. Use the empty string as the end key and limit the number of returned keys, as you did with the 'max' value. If I understand correctly, the end key is used to generate an end token by hashing it, and there is not

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
In other words, I should check the values as I iterate, and stop iterating when I get out of range? I'll try that! On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller martin.grabmuel...@eleven.de wrote: When not using OOP, you should not use something like 'CATEGORY/' as the end key. Use

RE: Range search on keys not working?

2010-06-02 Thread Dr . Martin Grabmüller
I think you can specify an end key, but it should be a key which does exist in your column family. But maybe I'm off the track here and someone else here knows more about this key range stuff. Martin From: David Boxenhorn [mailto:da...@lookin2.com]

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
Here is the relevant part of the previous thread: Thank you. That is very good news. I can sort the results myself - what is important is that I get them! On Thu, May 13, 2010 at 2:42 AM, Vijay vijay2...@gmail.com wrote: If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns

Re: Range search on keys not working?

2010-06-02 Thread Ben Browning
Martin, On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller martin.grabmuel...@eleven.de wrote: I think you can specify an end key, but it should be a key which does exist in your column family. Logically, it doesn't make sense to ever specify an end key with random partitioner. If you

Re: Range search on keys not working?

2010-06-02 Thread Sylvain Lebresne
So why do the start and finish range parameters exist? Because especially if you want to iterate over all your key (which as stated by Ben above is the only meaningful way to use get_range_slices() with the random partitionner), you'll want to paginate that. And that's where the 'start' and

Re: Range search on keys not working?

2010-06-02 Thread David Boxenhorn
I see. But we could make this work if the random partitioner was random only between nodes, but was still ordered within each node. (Or if there were another partitioner that did this.) That way we could get everything we need from each node separately. The results would not be ordered, but they

Re: Range search on keys not working?

2010-06-02 Thread Jonathan Shook
Can you clarify what you mean by 'random between nodes' ? On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn da...@lookin2.com wrote: I see. But we could make this work if the random partitioner was random only between nodes, but was still ordered within each node. (Or if there were another

Re: Handling disk-full scenarios

2010-06-02 Thread Ian Soboroff
Ok, answered part of this myself. You can stop a node, move files around on the data disks, as long as they stay in the right keyspace directories, and all is fine. Now, I have a single Data.db file which is 900GB and is compacted. The drive its on is only 1.5TB, so it can't anticompact at all.

Giant sets of ordered data

2010-06-02 Thread David Boxenhorn
How do I handle giant sets of ordered data, e.g. by timestamps, which I want to access by range? I can't put all the data into a supercolumn, because it's loaded into memory at once, and it's too much data. Am I forced to use an order-preserving partitioner? I don't want the headache. Is there

Re: Giant sets of ordered data

2010-06-02 Thread Ben Browning
/hour/day/year depending on the volume of your data. Something like the following: SomeTimeData: { // columnfamily 20100601: { // key, mmdd 123456789: value1, // column name is milliseconds since epoch 123456799: value2 }, 20100602: { 12345889: value3 } } Now you can use

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
Either OPP by key, or within a row by column name. I'd suggest the latter. If you have structured data to stick under a column (named by the timestamp), then you can serialize and unserialize it yourself, or you can use a supercolumn. It's effectively the same thing. Cassandra only provides the

Re: Giant sets of ordered data

2010-06-02 Thread David Boxenhorn
Let's say you're logging events, and you have billions of events. What if the events come in bursts, so within a day there are millions of events, but they all come within microseconds of each other a few times a day? How do you find the events that happened on a particular day if you can't store

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread Nahor
On 2010-06-02 3:18, David Boxenhorn wrote: Is it possible to make a heterogeneous Cassandra cluster, with both Linux and Windows nodes? I tried doing it and got Error in ThreadPoolExecutor java.lang.NullPointerException Not sure if this is due to the Linux/Windows mix or something else.

Re: Heterogeneous Cassandra Cluster

2010-06-02 Thread David Boxenhorn
Our replication factor was 1, so that wasn't the problem. (We tried other replication factors too, just in case, but it didn't help.) On Wed, Jun 2, 2010 at 7:51 PM, Nahor nahor.j+gm...@gmail.comnahor.j%2bgm...@gmail.com wrote: On 2010-06-02 3:18, David Boxenhorn wrote: Is it possible to

Re: Error during startup

2010-06-02 Thread Gary Dusbabek
I was able to reproduce the error by staring up a node using RandomPartioner, kill it, switch to OrderPreservingPartitioner, restart, kill, switch back to RandomPartitioner, BANG! So it looks like you tinkered with the partitioner at some point. This has the unfortunate effect of corrupting your

Capacity planning and Re: Handling disk-full scenarios

2010-06-02 Thread Ian Soboroff
Reading some more (someone break in when I lose my clue ;-) Reading the streams page in the wiki about anticompaction, I think the best approach to take when a node gets its disks overfull, is to set the compaction thresholds to 0 on all nodes, decommission the overfull node, wait for stuff to

Re: Continuously increasing RAM usage

2010-06-02 Thread Paul Brown
FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1, SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building up. I've seen this sort of issue in systems that make heavy use of java.util.concurrent queues/executors, e.g.:

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
If you want to do range queries on the keys, you can use OPP to do this: (example using UTF-8 lexicographic keys, with bursts split across rows according to row size limits) Events: { 20100601.05.30.003: { 20100601.05.30.003: value 20100601.05.30.007: value ... } } With a future

Re: Giant sets of ordered data

2010-06-02 Thread Jonathan Shook
Insert if you want to use long values for keys and column names above paragraph 2. I forgot that part. On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook jsh...@gmail.com wrote: If you want to do range queries on the keys, you can use OPP to do this: (example using UTF-8 lexicographic keys, with

Re: Continuously increasing RAM usage

2010-06-02 Thread Torsten Curdt
We've also seen something like this. Will soon investigate and try again with 0.6.2 On Wed, Jun 2, 2010 at 20:27, Paul Brown paulrbr...@gmail.com wrote: FWIW, I'm seeing similar issues on a cluster.  Three nodes, Cassandra 0.6.1, SUN JDK 1.6.0_b20.  I will try to get some heap dumps to see

Changing replication factor from 2 to 3

2010-06-02 Thread Eric Halpern
We'd like to double our cluster size from 4 to 8 and increase our replication factor from 2 to 3. Is there any special procedure we need to follow to increase replication? Is it sufficient to just start the new nodes with the replication factor of 3 and then reconfigure the existing nodes to

Re: Changing replication factor from 2 to 3

2010-06-02 Thread Rob Coli
On 6/2/10 12:49 PM, Eric Halpern wrote: We'd like to double our cluster size from 4 to 8 and increase our replication factor from 2 to 3. Is there any special procedure we need to follow to increase replication? Is it sufficient to just start the new nodes with the replication factor of 3 and

Re: Read operation with CL.ALL, not yet supported?

2010-06-02 Thread Yuki Morishita
Gary, Thanks for reply. I've opened an issue at https://issues.apache.org/jira/browse/CASSANDRA-1152 Yuki 2010/6/3 Gary Dusbabek gdusba...@gmail.com: Yuki, Can you file a jira ticket for this (https://issues.apache.org/jira/browse/CASSANDRA)?  The wiki indicates that this should be

Re: Continuously increasing RAM usage

2010-06-02 Thread Jake Luciani
I've started seeing this issue as well. Running 0.6.2. One interesting thing I happened upon, I explicitly called the GC via jconsole and the heap dropped completely fixing the issue. When you explicitly call System.gc() it does a full sweep. I'm wondering if this issue is to do with the GC

Effective cache size

2010-06-02 Thread David King
If I go to fetch some row given the rack-unaware placement strategy, the default snitch and CL==ONE, the node that is asked is the first node in the ring with the datum that is currently up, then a checksum is sent to the replicas to trigger read repair as appropriate. So with the row cache,