Still seems MEM.
However it's hard to convince that constantly writing(even great amount
of data) needs so much MEM(16GB). The process is quite simple,
input_data - memtable - flush to disk
right? What does cassandra need so much MEM for?
Thanks!
?? 2010-06-02 16:24 +0800??lwl??
No.
Thanks Peter!
In my test application, for each record,
rowkey - rand() * 4, about 64B
column * 20 - rand() * 20, about 320B
I use batch_insert(rowkey, col*20) in thrift.
Kevin Yuan
??: Peter Sch??ller sc...@spotify.com
??: user@cassandra.apache.org
:
Sounds like you are not using an order preserving partitioner?
On Wed, Jun 2, 2010 at 13:48, David Boxenhorn da...@lookin2.com wrote:
Range search on keys is not working for me. I was assured in earlier threads
that range search would work, but the results would not be ordered.
I'm trying to
The previous thread where we discussed this is called, key is sorted?
On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn da...@lookin2.com wrote:
I'm not using OPP. But I was assured on earlier threads (I asked several
times to be sure) that it would work as stated below: the results would not
When not using OOP, you should not use something like 'CATEGORY/' as the end
key.
Use the empty string as the end key and limit the number of returned keys, as
you did with
the 'max' value.
If I understand correctly, the end key is used to generate an end token by
hashing it, and
there is not
In other words, I should check the values as I iterate, and stop iterating
when I get out of range?
I'll try that!
On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
martin.grabmuel...@eleven.de wrote:
When not using OOP, you should not use something like 'CATEGORY/' as the
end key.
Use
I think you can specify an end key, but it should be a key which does exist in
your column family.
But maybe I'm off the track here and someone else here knows more about this
key range stuff.
Martin
From: David Boxenhorn [mailto:da...@lookin2.com]
Here is the relevant part of the previous thread:
Thank you. That is very good news. I can sort the results myself - what is
important is that I get them!
On Thu, May 13, 2010 at 2:42 AM, Vijay vijay2...@gmail.com wrote:
If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns
Martin,
On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
martin.grabmuel...@eleven.de wrote:
I think you can specify an end key, but it should be a key which does exist
in your column family.
Logically, it doesn't make sense to ever specify an end key with
random partitioner. If you
So why do the start and finish range parameters exist?
Because especially if you want to iterate over all your key (which as
stated by Ben above
is the only meaningful way to use get_range_slices() with the random
partitionner), you'll
want to paginate that. And that's where the 'start' and
I see. But we could make this work if the random partitioner was random only
between nodes, but was still ordered within each node. (Or if there were
another partitioner that did this.) That way we could get everything we need
from each node separately. The results would not be ordered, but they
Can you clarify what you mean by 'random between nodes' ?
On Wed, Jun 2, 2010 at 8:15 AM, David Boxenhorn da...@lookin2.com wrote:
I see. But we could make this work if the random partitioner was random only
between nodes, but was still ordered within each node. (Or if there were
another
Ok, answered part of this myself. You can stop a node, move files around on
the data disks, as long as they stay in the right keyspace directories, and
all is fine.
Now, I have a single Data.db file which is 900GB and is compacted. The
drive its on is only 1.5TB, so it can't anticompact at all.
How do I handle giant sets of ordered data, e.g. by timestamps, which I want
to access by range?
I can't put all the data into a supercolumn, because it's loaded into memory
at once, and it's too much data.
Am I forced to use an order-preserving partitioner? I don't want the
headache. Is there
/hour/day/year depending on the volume of your data.
Something like the following:
SomeTimeData: { // columnfamily
20100601: { // key, mmdd
123456789: value1, // column name is milliseconds since epoch
123456799: value2
},
20100602: {
12345889: value3
}
}
Now you can use
Either OPP by key, or within a row by column name. I'd suggest the latter.
If you have structured data to stick under a column (named by the
timestamp), then you can serialize and unserialize it yourself, or you
can use a supercolumn. It's effectively the same thing. Cassandra
only provides the
Let's say you're logging events, and you have billions of events. What if
the events come in bursts, so within a day there are millions of events, but
they all come within microseconds of each other a few times a day? How do
you find the events that happened on a particular day if you can't store
On 2010-06-02 3:18, David Boxenhorn wrote:
Is it possible to make a heterogeneous Cassandra cluster, with both
Linux and Windows nodes? I tried doing it and got
Error in ThreadPoolExecutor
java.lang.NullPointerException
Not sure if this is due to the Linux/Windows mix or something else.
Our replication factor was 1, so that wasn't the problem. (We tried other
replication factors too, just in case, but it didn't help.)
On Wed, Jun 2, 2010 at 7:51 PM, Nahor
nahor.j+gm...@gmail.comnahor.j%2bgm...@gmail.com
wrote:
On 2010-06-02 3:18, David Boxenhorn wrote:
Is it possible to
I was able to reproduce the error by staring up a node using
RandomPartioner, kill it, switch to OrderPreservingPartitioner,
restart, kill, switch back to RandomPartitioner, BANG!
So it looks like you tinkered with the partitioner at some point.
This has the unfortunate effect of corrupting your
Reading some more (someone break in when I lose my clue ;-)
Reading the streams page in the wiki about anticompaction, I think the best
approach to take when a node gets its disks overfull, is to set the
compaction thresholds to 0 on all nodes, decommission the overfull node,
wait for stuff to
FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1,
SUN JDK 1.6.0_b20. I will try to get some heap dumps to see what's building up.
I've seen this sort of issue in systems that make heavy use of
java.util.concurrent queues/executors, e.g.:
If you want to do range queries on the keys, you can use OPP to do this:
(example using UTF-8 lexicographic keys, with bursts split across rows
according to row size limits)
Events: {
20100601.05.30.003: {
20100601.05.30.003: value
20100601.05.30.007: value
...
}
}
With a future
Insert if you want to use long values for keys and column names
above paragraph 2. I forgot that part.
On Wed, Jun 2, 2010 at 1:29 PM, Jonathan Shook jsh...@gmail.com wrote:
If you want to do range queries on the keys, you can use OPP to do this:
(example using UTF-8 lexicographic keys, with
We've also seen something like this. Will soon investigate and try
again with 0.6.2
On Wed, Jun 2, 2010 at 20:27, Paul Brown paulrbr...@gmail.com wrote:
FWIW, I'm seeing similar issues on a cluster. Three nodes, Cassandra 0.6.1,
SUN JDK 1.6.0_b20. I will try to get some heap dumps to see
We'd like to double our cluster size from 4 to 8 and increase our replication
factor from 2 to 3.
Is there any special procedure we need to follow to increase replication?
Is it sufficient to just start the new nodes with the replication factor of
3 and then reconfigure the existing nodes to
On 6/2/10 12:49 PM, Eric Halpern wrote:
We'd like to double our cluster size from 4 to 8 and increase our replication
factor from 2 to 3.
Is there any special procedure we need to follow to increase replication?
Is it sufficient to just start the new nodes with the replication factor of
3 and
Gary,
Thanks for reply. I've opened an issue at
https://issues.apache.org/jira/browse/CASSANDRA-1152
Yuki
2010/6/3 Gary Dusbabek gdusba...@gmail.com:
Yuki,
Can you file a jira ticket for this
(https://issues.apache.org/jira/browse/CASSANDRA)? The wiki indicates
that this should be
I've started seeing this issue as well. Running 0.6.2.
One interesting thing I happened upon, I explicitly called the GC via
jconsole and the heap dropped completely fixing the issue. When you
explicitly call System.gc() it does a full sweep. I'm wondering if this
issue is to do with the GC
If I go to fetch some row given the rack-unaware placement strategy, the
default snitch and CL==ONE, the node that is asked is the first node in the
ring with the datum that is currently up, then a checksum is sent to the
replicas to trigger read repair as appropriate. So with the row cache,
30 matches
Mail list logo