Because they are occurring in parallel.
So if a range is out of sync between A-B and A-C, A will receive the
repairing stream from both (in any order) and will apply mutations based on
that and the usual overwrite rules so necessarily exclude one of the
repairing stream and that data will not
Hi,
We're currently in the planning stage of a new project which needs a low
latency, persistent key/value store with a roughly 60:40 read/write split.
We're trying to establish if Cassandra is a good fit for this and in particular
what the hardware requirements would be to have the majority
thanks a lot for all the help! I have gone through the steps and
successfully brought up the node2 :)
On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen yulin...@gmail.com wrote:
Because the file only preserve the key of records, not the whole record.
Records for those saved key will be loaded into
Somewhere I remember discussions about issues with the merkle tree range
splitting or some such that resulted in repair always thinking a little bit of
data was out of sync.
If you want to get a better idea about what's been transfered turn the logging
up to DEBUG or turn it up just for
IIRC cassandra 0.7 needs thrift 0.5, are you using that version ?
Perhaps try grabbing the cassandra 0.7 version for one of the pre built clients
(pycassa, hector etc) to check things work and then check you are using the
same thrift version.
Cheers
-
Aaron Morton
Freelance
Hi All,
I have a program that crunches through around 3 billion calculations. We
store the result of each of these in cassandra to later query once in order
to create some vectors. Our processing is limited by Cassandra now, rather
than the calculations themselves.
I was wondering what settings
just found out that changes via cassandra-cli, the schema change didn't
reach node2. and node2 became unreachable
I did as this document:
http://wiki.apache.org/cassandra/FAQ#schema_disagreement
but after that I just got two schema versons:
ddcada52-c96a-11e0-99af-3bd951658d61: [node1,
--
All the best,
Bruno Sinkovic
Office : +31.20.893.2375
Mobile: +31.631.644.719
Skype: bruno.sinkovic
Fax: +31.20.203.1189
Where is your bottleneck?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html
On Thu, Aug 18, 2011 at 6:08 AM, Paul Loy ketera...@gmail.com wrote:
Hi All,
I have a program that crunches through around 3 billion calculations. We
store the result of each of these in cassandra to
Hi All,
This is regarding help to resolve connection refused error on Cassandra
client API.
I have installed cassandra-0.8.4 on three machines and tried to upload a file
from HDFS to Cassandra by Hadoop map-reduce program and have caught up
connection refused error.
But, the same code is
What is rpc_address set to in cassandra.yaml?
Try setting these to 0.0.0.0 to be sure it's listening to external traffic.
On Thu, Aug 18, 2011 at 8:37 AM, Thamizh tceg...@yahoo.co.in wrote:
Hi All,
This is regarding help to resolve connection refused error on Cassandra
client API.
I have
Are you writing lots of tiny rows or a few very large rows, are you batching
mutations? is the loading disk or cpu or network bound?
-Jake
On Thu, Aug 18, 2011 at 7:08 AM, Paul Loy ketera...@gmail.com wrote:
Hi All,
I have a program that crunches through around 3 billion calculations. We
Philippe,
Besides the system keyspace, we have only one user keyspace. However, tell
me that we can also try repairing one CF at a time.
We have two concurrent compactors configured. Will change that to one.
Huy
On Wed, Aug 17, 2011 at 6:10 PM, Philippe watche...@gmail.com wrote:
Huy,
Unfortunately repairing one cf at a time didn't help in my case because it
still streams all CF and that triggers lots of compactions
On Aug 18, 2011 3:48 PM, Huy Le hu...@springpartners.com wrote:
Thanks Ed/Aaron, that really helped a lot.
Just to clarify on the question of writes (sorry, I worded that badly) - do
write operations insert rows into the cache on all nodes in the replica set or
does the cache only get populated on reads?
Aaron - in terms of scale, our ultimate goal is to
I am running cassandra 0.7.8. pycassa 1.1.0
Nodes=7, RF=3
This problem started a few months ago and only occurs sporadically.
I receive notifications from paypal's IPN. The IPN data is saved into
a column family. I add another column for processed which is set to
0.
Every 5 minutes, a cron
Yeah, we're processing item similarities. So we are writing single columns
at a time. Although we do batch these into 400 mutations before sending to
Cassy. We currently perform almost 2 billion calculations that then write
almost 4 billion columns.
Once all similarities are calculated, we just
So you only have 1 cassandra node?
If you are interested only in getting the complete work done as fast as
possible before you begin reading, take a look at the new bulk loader in
cassandra:
http://www.datastax.com/dev/blog/bulk-loading
-Jake
On Thu, Aug 18, 2011 at 11:03 AM, Paul Loy
Hi All
I have not been able to list the contents of an existing Column Family:
[default@MyKeySpace] Describe keyspace MyKeySpace;
Keyspace: MyKeySpace:
Replication Strategy:
org.apache.cassandra.locator.NetworkTopologyStrategy
Options: [datacenter1:1]
Column Families:
On Thu, Aug 18, 2011 at 10:36 AM, Stephen Henderson
stephen.hender...@cognitivematch.com wrote:
Thanks Ed/Aaron, that really helped a lot.
Just to clarify on the question of writes (sorry, I worded that badly) - do
write operations insert rows into the cache on all nodes in the replica set
There are a lot of people on 0.7 for whom CL is working as advertised.
Not saying it's impossible that there's a bug, but the odds are
against it.
Is it possible for instance that sometimes your cron job takes longer
than five minutes?
On Thu, Aug 18, 2011 at 9:49 AM, Kyle Gibson
Step 0: use multiple threads to insert
On Thu, Aug 18, 2011 at 10:03 AM, Paul Loy ketera...@gmail.com wrote:
Yeah, we're processing item similarities. So we are writing single columns
at a time. Although we do batch these into 400 mutations before sending to
Cassy. We currently perform almost
Hi Jake,
Thanks a lot.
I have done the below modification as you pointed out.
Once, I do the modification on cassandra.yaml does it require to restart the
cassandra service?
Now, I am getting below error, (attached cassandra.yaml)
11/08/18 10:35:18 INFO mapred.JobClient: map 100% reduce 0%
Yup, we do that. We currently have 200 threads that push mutations into a
pool of Mutators (think Pelops - although that was too slow so we rolled our
own much lower level version). We have around 50 thrift clients that
mutations are them pushed through to cassandra.
On Thu, Aug 18, 2011 at 4:35
https://issues.apache.org/jira/browse/CASSANDRA-3054
On Wed, Aug 17, 2011 at 7:13 PM, aaron morton aa...@thelastpickle.comwrote:
ooo, didn't know there was a drop index statement.
I got the same result, the Antlr grammar seems to say it's a valid
identifier (not that I have much Antlr foo)…
Yeah, the data after crunching drops to just 65000 columns so one Cassandra
is plenty. That will all go in memory on one box. It's only the crunching
where we have lots of data and then need it arranged in a structured manner.
That's why I don't use flat files that I just append to. I need them in
Hi
I'm trying to test a single issue:
https://issues.apache.org/jira/browse/CASSANDRA-674
But when I downloaded the patch file I can't find the correct trunk to patch...
Anyone can help me with it? Thanks
Steve
Thanks. I won't try that then.
So in our environment, after upgrading from 0.6.11 to 0.8.4, we have to run
scrub on all nodes before we can run repair on them. Is there any chance
that running scrub on the nodes causing data from all SSTables being
streamed to/from other nodes on running
Hi,
Is it normal that the repair takes 4+ hours for every node, with only about 10G
data? If this is not expected, do we have any hint what could be causing this?
The ring looks like below, we're using 0.8.1. Our repair is scheduled to run
once per week for all nodes.
Compaction related
I am in the process of trying to tune the memtable flush thresholds for a
particular column family (super column family to be specific) in my
Cassandra 0.8.1 cluster. This CF is reasonably heavily used and getting
flushed roughly every 5-8 minutes which is hardly optimal, particularly
given I have
See http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/,
specifically the section on memtable_total_space_in_mb
On Thu, Aug 18, 2011 at 2:43 PM, Dan Hendry dan.hendry.j...@gmail.com wrote:
I am in the process of trying to tune the memtable flush thresholds for a
particular column
Interesting.
Just to clarify, there are three main conditions which will trigger a flush
(based on data size):
1. The serialized size of a memtable exceeds the per CF memtable_throughput
setting.
2. For a single cf: (serialized size)*(live ratio)*(maximum possible
memtables in memory)
Look in the logs to work find out why the migration did not get to node2.
Otherwise yes you can drop those files.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
just found out that
Those numbers sound achievable. As with any scaling start with the default
config and see how you go, a 5ms response time is certainly reasonable as are
the throughput numbers.
e.g. If you started with 6 nodes, rf 3, with read repair turned on
20k ops - 12k reads and 8k writes
X3 because of
Are you using 0.0.0.0 in your mapred config ? It should be the external IP of
the cassandra node.
0.0.0.0 is just for the rpc_address config.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 19/08/2011, at 3:35 AM, Thamizh
couple of thoughts, 400 row mutations in a batch may be a bit high. More is not
always better. Watch the TP stats to see if the mutation pool is backing up
excessively.
Also if you feel like having fun take a look at the durable_writes config
setting for keyspaces, from the cli help…
-
Try a restart on that guy with and with the JVM option
-dcassandra.load_ring_state=false (see cassandra-env.sh).
If that does not fix it….
http://www.datastax.com/docs/0.8/troubleshooting/index#view-of-ring-differs-between-some-nodes
Cheers
-
Aaron Morton
Freelance Cassandra
No scrub is a local operation only.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 19/08/2011, at 6:36 AM, Huy Le wrote:
Thanks. I won't try that then.
So in our environment, after upgrading from 0.6.11 to 0.8.4, we have
The compactions ettings do not affect repair. (Thinking out loud, or does it ?
Validation compactions and table builds.)
Watch the logs or check
nodetool compactionstats to see when the Validation completes completes.
and
nodetool netstats to see how long the data transfer takes
It sounds a
We're trying to bootstrap some new nodes and it appears when adding a new node
that there is a lot of logging on hints being flushed and compacted. It's been
taking about 75 minutes thus far to bootstrap for only about 10 GB of data.
It's ballooned up to over 40 GB on the new node. I do 'ls
Hi,
version - 0.7.4
cluster size = 3
RF = 3.
data size on a node ~500G
I want to do some disk maintenance on a cassandra node, so the process that
I came up with is
- drain this node
- back up the system data space
- rebuild the disk partition
- copy data from another node
- copy
I'm reading the commitLog code since I have some similar logic in my
application code,
so that I could benefit from the same techniques that CommitLog code uses.
I see that
CommitLog.add(RowMutation rowMutation) {
executor.add(new LogRecordAdder(rowMutation));
}
while executor could be
thanks Jonathan, found it
public BatchCommitLogExecutorService(int queueSize)
{
queue = new LinkedBlockingQueueCheaterFutureTask(queueSize);
...
appendingThread = new Thread(runnable, COMMIT-LOG-WRITER);
appendingThread.start();
}
On Thu, Aug
I've been using Cassandra as a database storage device for a service based
application.
I'm wondering if you can design a multi-tiered cassandra cluster that is
used by both clients and servers.
I'd like to have the ability to setup the following:
*Implement a Core Seed Servers / Nodes internal
Why not use couchdb for this use case?
Milind
/***
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
/
On Aug 18, 2011 9:07 PM, Nicholas Neuberger nneuberg...@gmail.com wrote:
I've been using Cassandra as a
Hi
I am using 0.7.4 and am seeing this exception my logs a few times a day,
should I be worried? Or is this just a intermittent network disconnect
ERROR [RequestResponseStage:257] 2011-08-19 03:05:30,706
AbstractCassandraDaemon.java (line 112) Fatal exception in thread
You should get on 0.7.4 while you are doing this, this is a pretty good reason
https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58
Never done a read repair on this cluster before, is that a problem?
Potentially.
Repair will ensure that your data is distributed, and that
OrientDB may be a perfect fit for you - a little bit like couch - on
java we use it too - and it's super fast
2011/8/19 Milind Parikh milindpar...@gmail.com
Why not use couchdb for this use case?
Milind
/***
sent from my android...please pardon occasional typos as I
48 matches
Mail list logo