I ran the tests with only one machine, so the CL_ONE is not the problem. Am
i right?
2013/1/15 Hiller, Dean dean.hil...@nrel.gov
What is your consistency level set to? If you set it to CL_ONE, you could
get different results or is your database constant and unchanging?
Dean
From: Sávio
Tnx all for clarification and your views. Queues and asynch are definitly a
way to go. Anyway I'll take pull+aggregate aproach for now, it should work
better for start. (if someone has the same follows app problem, there is
a great research:
Hi,
I am trying to understand the read path in Cassandra. I've read Cassandra's
documentation and it seems that the read path is like this:
- Client contacts with a proxy node which performs the operation over
certain object
- Proxy node sends requests to every replica of that object
- Replica
Hi,
I know that DataStax Enterprise package provide Brisk, but is there a community
version ? Is it easy to interface Hadoop with Cassandra as the storage or do we
absolutely have to use Brisk for that ?
I know CassandraFS is natively available in cassandra 1.2, the version I use,
so is there
We have multiple clients reading the same row key. It makes no sense fail
in one machine. When we use Thrift, Cassandra always returns the correct
result.
2013/1/16 Sávio Teles savio.te...@lupa.inf.ufg.br
I ran the tests with only one machine, so the CL_ONE is not the problem.
Am i right?
Here are a few examples I have worked on, reading from xml.gz files then
writing to cassandara.
https://github.com/jschappet/medline
You will also need:
https://github.com/jschappet/medline-base
These examples are Hadoop Jobs using Cassandra as the Data Store.
This one is a good place to
I don't want to write to Cassandra as it replicates data from another
datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from
it. I would like to use the same configuration as
http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster but
I want to know
Try this one then, it reads from cassandra, then writes back to cassandra,
but you could change the write to where ever you would like.
getConf().set(IN_COLUMN_NAME, columnName );
Job job = new Job(getConf(), ProcessRawXml);
Hello,
We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or
possibly 1.1.9 depending on timing). It is my understanding that
rolling upgrades of Cassandra is supported, so as we upgrade our
cluster, we can do so one node at a time without experiencing downtime.
Has anyone
You're missing the correct definition of read_repair_chance.
When you do a read at CL.ALL, all replicas are wait upon and the results
from all those replicas are compared. From that, we can extract which nodes
are not up to date, i.e. which ones can be read repair. And if some node
need to be
always check NEWS.txt for instance for cassandra 1.1.3 you need to
run nodetool upgradesstables if your cf has counter.
On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com wrote:
Hello,
We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or
possibly 1.1.9 depending on
Thanks for pointing that out.
Given upgradesstables can only be run on a live node, does anyone know
if there is a danger of having this node in the cluster while this is
being performed? Also, can anyone confirm this only needs to be done on
counter counter column families, or all column
a, ok. Now I understand where the data came from. When using CL.ALL
read_repair always repairs inconsistent data.
Thanks a lot, Sylvain.
Carlos Pérez Miguel
2013/1/17 Sylvain Lebresne sylv...@datastax.com
You're missing the correct definition of read_repair_chance.
When you do a read
upgradesstables is safe, but it is essentially compaction (because sstables are
immutable it rewrites the sstable in the new format) so you'll want to do it
when traffic is low to avoid IO issues.
upgradesstables always needs to be done between majors. While 1.1.2 - 1.1.8 is
not a major, due
Hi there,
I am sorry to get into this thread with more questions but isn't the
gossip protocol in charge of making the read_repair automatically
anytime a new node comes into the ring? I mean if a node is down, then
we get that node up and running again, wouldn't it be synchronized
automatically?
When you view OpsCenter metrics, you're generating a small number of reads
to fetch the metric data, which is why your read count is near zero instead
of actually being zero. Since reads are still occurring, Cassandra will
continue to show a read latency. Basically, you're just viewing the
I mean if a node is down, then
we get that node up and running again, wouldn't it be synchronized
automatically?
It will, thanks to hinted handoff (not gossip, gossip only handle the ring
topology and a bunch of metadata, it doesn't deal with data synchronization
at all). But hinted handoff
Hmm, that's sense but then why is the latency for the reads that get the
metric often so high (several thousand uSecs) and why does it so closely
track the latency of my normal reads?
On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote:
When you view OpsCenter metrics,
Thanks for the explanation Sylvain!
2013/1/16 Sylvain Lebresne sylv...@datastax.com:
I mean if a node is down, then
we get that node up and running again, wouldn't it be synchronized
automatically?
It will, thanks to hinted handoff (not gossip, gossip only handle the ring
topology and a
We have quite wide rows and do a lot of concentrated processing on each
row...so I thought I'd try the row cache on one node in my cluster to see
if I could detect an effect of using it.
The problem is that nodetool info says that even with a two gig row_cache
we're getting zero requests. Since
Leonid Ilyevsky
Moon Capital Management, LP
499 Park Avenue
New York, NY 10022
P: (212) 652-4586
F: (212) 652-4501
E: lilyev...@mooncapital.com
[cid:image001.png@01CDF3EE.E9EA60F0]
This email, along with any attachments, is confidential and may be legally
Writing to the list user@cassandra.apache.org
Subscription addressuser-subscr...@cassandra.apache.org
Digest subscription address user-digest-subscr...@cassandra.apache.org
Unsubscription addressesuser-unsubscr...@cassandra.apache.org
Getting help with the list
Brisk is pretty much stagnant. I think someone forked it to work with 1.0
but not sure how that is going. You'll need to pay for DSE to get CFS
(which is essentially Brisk) if you want to use any modern version of C*.
Best,
Michael
On 1/16/13 11:17 AM, cscetbon@orange.com
On cassandra 1.1.5 with a write heavy workload, we're having problems
getting rows to be compacted away (removed) even though all columns have
expired TTL. We've tried size tiered and now leveled and are seeing the
same symptom: the data stays around essentially forever.
Currently we write all
What I mean is that if there is a way of doing this but using Hector:
-
public static void main(String[] args) throws Exception {
Connector conn = new Connector();
Cassandra.Client
William,
I just saw your message today. I am using Cassandra + Amazon EMR
(hadoop 1.0.3) but I am not using PIG as you are. I set my configuration
vars in Java, as I have a custom jar file and I am using
ColumnFamilyInputFormat.
However, if I understood well your problem, the only thing
Here is the point. You're right this github repository has not been updated for
a year and a half. I thought brisk was just a bundle of some technologies and
that it was possible to install the same components and make them work together
without using this bundle :(
On Jan 16, 2013, at 8:22
Hello,
I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on
EC2 and my hadoop process runs at EMR. For cassandra storage, I am using
local EC2 EBS disks.
My system is running fine for my tests, but to me it's not a good setup
for production. I need my system to perform
After searching for a while I found what I was looking for [1]
Hope it helps to someone else (:
Renato M.
[1] http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1
2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com:
What I mean is that if there is a way of
DataStax recommended (forget the reference) to use the ephemeral disks in
RAID0, which is what I've been running for well over a year now in
production.
In terms of how I'm doing Cassandra/AWS/Hadoop, I started by doing the
split data center thing (one DC for low latency queries, one DC for
We use cassandra on ephemeral drives. Yes, that means we need more nodes to
hold more data, but doesn't that play into cassandra's strengths?
It sounds like you're trying to vertically scale your cassandra cluster.
On Jan 16, 2013, at 12:42 PM, Marcelo Elias Del Valle wrote:
Hello,
I
Storage size is not a problem, you always can add more nodes. Anyway, it is
not recommended to have nodes with more then 500G (compaction, repair take
forever). EC2 m1.large has 800G of ephemeral storage, EC2 m1.xlarge 1.6T.
I'd recommend xlarge, it has 4 CPUs, so maintenance procedures don't
That's good info! Thanks!
2013/1/16 William Oberman ober...@civicscience.com
DataStax recommended (forget the reference) to use the ephemeral disks in
RAID0, which is what I've been running for well over a year now in
production.
In terms of how I'm doing Cassandra/AWS/Hadoop, I started by
We're currently using Cassandra on EC2 at very low scale (a 2 node
cluster on m1.large instances in two regions.) I don't believe that
EBS is recommended for performance reasons. Also, it's proven to be
very unreliable in the past (most of the big/notable AWS outages were
due to EBS issues.) We've
Hi,
I have a strange behavior I am not able to understand.
I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a
replication factor of 3.
---
my story is maybe too long, trying shorter here, while saving what I wrote in
case someone has patience to read my bad
On Wed, Jan 16, 2013 at 2:37 PM, cscetbon@orange.com wrote:
Here is the point. You're right this github repository has not been updated
for a year and a half. I thought brisk was just a bundle of some technologies
and that it was possible to install the same components and make them work
A few milliseconds (or a few thousand usecs) isn't terribly high,
considering that number includes at least one round trip between nodes.
I'm not sure about the tracking behavior that you're describing -- could
you provide some more details or perhaps screenshots?
On Wed, Jan 16, 2013 at 12:16
You have to change the column family cache info from keys_only to otherwise
the cache will not br on for this cf.
On Wednesday, January 16, 2013, Brian Tarbox tar...@cabotresearch.com
wrote:
We have quite wide rows and do a lot of concentrated processing on each
row...so I thought I'd try the
I think at this point cassandra startup scripts should reject versions
since cassandra won't even star with many jvms at this point.
On Tuesday, January 15, 2013, Michael Kjellman mkjell...@barracuda.com
wrote:
Do yourself a favor and get a copy of the Oracle 7 JDK (now with more
security
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted
I guess your expired columns are propagated to high tier CF, which gets
compacted rarely.
So, you have to wait when high tier CF gets compacted.
Andrey
On Wed, Jan 16, 2013 at
According to the timestamps (see original post) the SSTable was written
(thus compacted compacted) 3 days after all columns for that row had
expired and 6 days after the row was created; yet all columns are still
showing up in the SSTable. Note that the column shows now rows when a
get for that
Just an FYI --
We will be hosting a webinar tomorrow demonstrating the use of Storm
as a distributed processing layer on top of Cassandra.
I'll be tag teaming with Taylor Goetz, the original author of storm-cassandra.
http://www.datastax.com/resources/webinars/collegecredit
It is part of the
Any idea whether interoperability b/w Thrift and CQL should work properly in
1.2?
AFAIK the only incompatibility is CQL 3 between pre 1.2 and 1.2.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 16/01/2013, at 1:24
The thrift request is not sending a composite type where it should. CQL 3 uses
composites in a lot of places.
What was your table definition?
Are you using a high level client or rolling your own?
Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
You *may* be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2503
It was implemented in 1.1.0 but perhaps data in the original cluster is more
compacted than the new one.
Are the increases for all CF's are just a few?
Do you have a work load of infrequent writes to rows followed by
If you think you have located a bug in Astyanax please submit it to
https://github.com/Netflix/astyanax
Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 17/01/2013, at 3:44 AM, Sávio Teles savio.te...@lupa.inf.ufg.br
Check the disk utilisation using iostat -x 5
If you are on a VM / in the cloud check for CPU steal.
Check the logs for messages from the GCInspector, the ParNew events are times
the JVM is paused.
Look at the times dropped messages are logged and try to correlate them with
other server events.
Minor compaction (with Size Tiered) will only purge tombstones if all fragments
of a row are contained in the SSTables being compacted. So if you have a long
lived row, that is present in many size tiers, the columns will not be purged.
(thus compacted compacted) 3 days after all columns for
One solution is to only read up to (now - 1 second). If this is a public
API where you want to guarantee full consistency (ie, if you have added a
message to the queue, it will definitely appear to be there) you can
instead delay requests for 1 second before reading up to the moment that
the
Delay read is acceptable, but problem still there:
A request come to node One at local time PM 10:00:01.000
B request come to node Two at local time PM 10:00:00.980
The correct order is A -- B
I am not sure how node C will handle the data, although A came before B,
but B's timestamp is earlier
Hi Aaron,
I am using thrift client.
Here is column family creation script:-
```
String colFamily = CREATE COLUMNFAMILY users (key varchar
PRIMARY KEY,full_name varchar, birth_date int,state varchar);
I'm not sure I fully understand your problem. You seem to be talking of
ordering the requests, in the order they are generated. But in that case,
you will rely on the ordering of columns within whatever row you store
request A and B in, and that order depends on the column names, which in
turns is
Yes, Sylvain, you are correct.
When I say A comes before B, it means client will secure the order,
actually, B will be sent only after get response of A request.
And Yes, A and B are not update same record, so it is not typical Cassandra
consistency problem.
And Yes, the column name is provide
53 matches
Mail list logo