Re: Astyanax returns empty row

2013-01-16 Thread Sávio Teles
I ran the tests with only one machine, so the CL_ONE is not the problem. Am i right? 2013/1/15 Hiller, Dean dean.hil...@nrel.gov What is your consistency level set to? If you set it to CL_ONE, you could get different results or is your database constant and unchanging? Dean From: Sávio

Re: How many BATCH inserts in to many?

2013-01-16 Thread Alan Ristić
Tnx all for clarification and your views. Queues and asynch are definitly a way to go. Anyway I'll take pull+aggregate aproach for now, it should work better for start. (if someone has the same follows app problem, there is a great research:

read path, I have missed something

2013-01-16 Thread Carlos Pérez Miguel
Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica

Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there

Re: Astyanax returns empty row

2013-01-16 Thread Sávio Teles
We have multiple clients reading the same row key. It makes no sense fail in one machine. When we use Thrift, Cassandra always returns the correct result. 2013/1/16 Sávio Teles savio.te...@lupa.inf.ufg.br I ran the tests with only one machine, so the CL_ONE is not the problem. Am i right?

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread James Schappet
Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
I don't want to write to Cassandra as it replicates data from another datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from it. I would like to use the same configuration as http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster but I want to know

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread James Schappet
Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like. getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), ProcessRawXml);

Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Mike
Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on timing). It is my understanding that rolling upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so one node at a time without experiencing downtime. Has anyone

Re: read path, I have missed something

2013-01-16 Thread Sylvain Lebresne
You're missing the correct definition of read_repair_chance. When you do a read at CL.ALL, all replicas are wait upon and the results from all those replicas are compared. From that, we can extract which nodes are not up to date, i.e. which ones can be read repair. And if some node need to be

Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Jason Wee
always check NEWS.txt for instance for cassandra 1.1.3 you need to run nodetool upgradesstables if your cf has counter. On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com wrote: Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on

Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Mike
Thanks for pointing that out. Given upgradesstables can only be run on a live node, does anyone know if there is a danger of having this node in the cluster while this is being performed? Also, can anyone confirm this only needs to be done on counter counter column families, or all column

Re: read path, I have missed something

2013-01-16 Thread Carlos Pérez Miguel
a, ok. Now I understand where the data came from. When using CL.ALL read_repair always repairs inconsistent data. Thanks a lot, Sylvain. Carlos Pérez Miguel 2013/1/17 Sylvain Lebresne sylv...@datastax.com You're missing the correct definition of read_repair_chance. When you do a read

Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Michael Kjellman
upgradesstables is safe, but it is essentially compaction (because sstables are immutable it rewrites the sstable in the new format) so you'll want to do it when traffic is low to avoid IO issues. upgradesstables always needs to be done between majors. While 1.1.2 - 1.1.8 is not a major, due

Re: read path, I have missed something

2013-01-16 Thread Renato Marroquín Mogrovejo
Hi there, I am sorry to get into this thread with more questions but isn't the gossip protocol in charge of making the read_repair automatically anytime a new node comes into the ring? I mean if a node is down, then we get that node up and running again, wouldn't it be synchronized automatically?

Re: How can OpsCenter show me Read Request Latency where there are no read requests??

2013-01-16 Thread Tyler Hobbs
When you view OpsCenter metrics, you're generating a small number of reads to fetch the metric data, which is why your read count is near zero instead of actually being zero. Since reads are still occurring, Cassandra will continue to show a read latency. Basically, you're just viewing the

Re: read path, I have missed something

2013-01-16 Thread Sylvain Lebresne
I mean if a node is down, then we get that node up and running again, wouldn't it be synchronized automatically? It will, thanks to hinted handoff (not gossip, gossip only handle the ring topology and a bunch of metadata, it doesn't deal with data synchronization at all). But hinted handoff

Re: How can OpsCenter show me Read Request Latency where there are no read requests??

2013-01-16 Thread Brian Tarbox
Hmm, that's sense but then why is the latency for the reads that get the metric often so high (several thousand uSecs) and why does it so closely track the latency of my normal reads? On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote: When you view OpsCenter metrics,

Re: read path, I have missed something

2013-01-16 Thread Renato Marroquín Mogrovejo
Thanks for the explanation Sylvain! 2013/1/16 Sylvain Lebresne sylv...@datastax.com: I mean if a node is down, then we get that node up and running again, wouldn't it be synchronized automatically? It will, thanks to hinted handoff (not gossip, gossip only handle the ring topology and a

trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests

2013-01-16 Thread Brian Tarbox
We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the row cache on one node in my cluster to see if I could detect an effect of using it. The problem is that nodetool info says that even with a two gig row_cache we're getting zero requests. Since

unsubscribe

2013-01-16 Thread Leonid Ilyevsky
Leonid Ilyevsky Moon Capital Management, LP 499 Park Avenue New York, NY 10022 P: (212) 652-4586 F: (212) 652-4501 E: lilyev...@mooncapital.com [cid:image001.png@01CDF3EE.E9EA60F0] This email, along with any attachments, is confidential and may be legally

Re: unsubscribe

2013-01-16 Thread Michael Kjellman
Writing to the list user@cassandra.apache.org Subscription addressuser-subscr...@cassandra.apache.org Digest subscription address user-digest-subscr...@cassandra.apache.org Unsubscription addressesuser-unsubscr...@cassandra.apache.org Getting help with the list

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread Michael Kjellman
Brisk is pretty much stagnant. I think someone forked it to work with 1.0 but not sure how that is going. You'll need to pay for DSE to get CFS (which is essentially Brisk) if you want to use any modern version of C*. Best, Michael On 1/16/13 11:17 AM, cscetbon@orange.com

LCS not removing rows with all TTL expired columns

2013-01-16 Thread Bryan Talbot
On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all

Re: Query column names

2013-01-16 Thread Renato Marroquín Mogrovejo
What I mean is that if there is a way of doing this but using Hector: - public static void main(String[] args) throws Exception { Connector conn = new Connector(); Cassandra.Client

Re: AWS EMR - Cassandra

2013-01-16 Thread Marcelo Elias Del Valle
William, I just saw your message today. I am using Cassandra + Amazon EMR (hadoop 1.0.3) but I am not using PIG as you are. I set my configuration vars in Java, as I have a custom jar file and I am using ColumnFamilyInputFormat. However, if I understood well your problem, the only thing

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
Here is the point. You're right this github repository has not been updated for a year and a half. I thought brisk was just a bundle of some technologies and that it was possible to install the same components and make them work together without using this bundle :( On Jan 16, 2013, at 8:22

Cassandra at Amazon AWS

2013-01-16 Thread Marcelo Elias Del Valle
Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform

Re: Query column names

2013-01-16 Thread Renato Marroquín Mogrovejo
After searching for a while I found what I was looking for [1] Hope it helps to someone else (: Renato M. [1] http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1 2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com: What I mean is that if there is a way of

Re: AWS EMR - Cassandra

2013-01-16 Thread William Oberman
DataStax recommended (forget the reference) to use the ephemeral disks in RAID0, which is what I've been running for well over a year now in production. In terms of how I'm doing Cassandra/AWS/Hadoop, I started by doing the split data center thing (one DC for low latency queries, one DC for

Re: Cassandra at Amazon AWS

2013-01-16 Thread Ben Chobot
We use cassandra on ephemeral drives. Yes, that means we need more nodes to hold more data, but doesn't that play into cassandra's strengths? It sounds like you're trying to vertically scale your cassandra cluster. On Jan 16, 2013, at 12:42 PM, Marcelo Elias Del Valle wrote: Hello, I

Re: Cassandra at Amazon AWS

2013-01-16 Thread Andrey Ilinykh
Storage size is not a problem, you always can add more nodes. Anyway, it is not recommended to have nodes with more then 500G (compaction, repair take forever). EC2 m1.large has 800G of ephemeral storage, EC2 m1.xlarge 1.6T. I'd recommend xlarge, it has 4 CPUs, so maintenance procedures don't

Re: AWS EMR - Cassandra

2013-01-16 Thread Marcelo Elias Del Valle
That's good info! Thanks! 2013/1/16 William Oberman ober...@civicscience.com DataStax recommended (forget the reference) to use the ephemeral disks in RAID0, which is what I've been running for well over a year now in production. In terms of how I'm doing Cassandra/AWS/Hadoop, I started by

Re: Cassandra at Amazon AWS

2013-01-16 Thread Jared Biel
We're currently using Cassandra on EC2 at very low scale (a 2 node cluster on m1.large instances in two regions.) I don't believe that EBS is recommended for performance reasons. Also, it's proven to be very unreliable in the past (most of the big/notable AWS outages were due to EBS issues.) We've

Cassandra timeout whereas it is not much busy

2013-01-16 Thread Nicolas Lalevée
Hi, I have a strange behavior I am not able to understand. I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a replication factor of 3. --- my story is maybe too long, trying shorter here, while saving what I wrote in case someone has patience to read my bad

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread Brandon Williams
On Wed, Jan 16, 2013 at 2:37 PM, cscetbon@orange.com wrote: Here is the point. You're right this github repository has not been updated for a year and a half. I thought brisk was just a bundle of some technologies and that it was possible to install the same components and make them work

Re: How can OpsCenter show me Read Request Latency where there are no read requests??

2013-01-16 Thread Tyler Hobbs
A few milliseconds (or a few thousand usecs) isn't terribly high, considering that number includes at least one round trip between nodes. I'm not sure about the tracking behavior that you're describing -- could you provide some more details or perhaps screenshots? On Wed, Jan 16, 2013 at 12:16

Re: trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests

2013-01-16 Thread Edward Capriolo
You have to change the column family cache info from keys_only to otherwise the cache will not br on for this cf. On Wednesday, January 16, 2013, Brian Tarbox tar...@cabotresearch.com wrote: We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the

Re: Starting Cassandra

2013-01-16 Thread Edward Capriolo
I think at this point cassandra startup scripts should reject versions since cassandra won't even star with many jvms at this point. On Tuesday, January 15, 2013, Michael Kjellman mkjell...@barracuda.com wrote: Do yourself a favor and get a copy of the Oracle 7 JDK (now with more security

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Andrey Ilinykh
To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. Andrey On Wed, Jan 16, 2013 at

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Bryan Talbot
According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable. Note that the column shows now rows when a get for that

Webinar: Using Storm for Distributed Processing on Cassandra

2013-01-16 Thread Brian O'Neill
Just an FYI -- We will be hosting a webinar tomorrow demonstrating the use of Storm as a distributed processing layer on top of Cassandra. I'll be tag teaming with Taylor Goetz, the original author of storm-cassandra. http://www.datastax.com/resources/webinars/collegecredit It is part of the

Re: Cassandra 1.2 thrift migration

2013-01-16 Thread aaron morton
Any idea whether interoperability b/w Thrift and CQL should work properly in 1.2? AFAIK the only incompatibility is CQL 3 between pre 1.2 and 1.2. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/01/2013, at 1:24

Re: error when creating column family using cql3 and persisting data using thrift

2013-01-16 Thread aaron morton
The thrift request is not sending a composite type where it should. CQL 3 uses composites in a lot of places. What was your table definition? Are you using a high level client or rolling your own? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton

Re: write count increase after 1.2 update

2013-01-16 Thread aaron morton
You *may* be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2503 It was implemented in 1.1.0 but perhaps data in the original cluster is more compacted than the new one. Are the increases for all CF's are just a few? Do you have a work load of infrequent writes to rows followed by

Re: Astyanax returns empty row

2013-01-16 Thread aaron morton
If you think you have located a bug in Astyanax please submit it to https://github.com/Netflix/astyanax Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 3:44 AM, Sávio Teles savio.te...@lupa.inf.ufg.br

Re: Cassandra timeout whereas it is not much busy

2013-01-16 Thread aaron morton
Check the disk utilisation using iostat -x 5 If you are on a VM / in the cloud check for CPU steal. Check the logs for messages from the GCInspector, the ParNew events are times the JVM is paused. Look at the times dropped messages are logged and try to correlate them with other server events.

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread aaron morton
Minor compaction (with Size Tiered) will only purge tombstones if all fragments of a row are contained in the SSTables being compacted. So if you have a long lived row, that is present in many size tiers, the columns will not be purged. (thus compacted compacted) 3 days after all columns for

Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Russell Haering
One solution is to only read up to (now - 1 second). If this is a public API where you want to guarantee full consistency (ie, if you have added a message to the queue, it will definitely appear to be there) you can instead delay requests for 1 second before reading up to the moment that the

Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Delay read is acceptable, but problem still there: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B I am not sure how node C will handle the data, although A came before B, but B's timestamp is earlier

Re: error when creating column family using cql3 and persisting data using thrift

2013-01-16 Thread Kuldeep Mishra
Hi Aaron, I am using thrift client. Here is column family creation script:- ``` String colFamily = CREATE COLUMNFAMILY users (key varchar PRIMARY KEY,full_name varchar, birth_date int,state varchar);

Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Sylvain Lebresne
I'm not sure I fully understand your problem. You seem to be talking of ordering the requests, in the order they are generated. But in that case, you will rely on the ordering of columns within whatever row you store request A and B in, and that order depends on the column names, which in turns is

Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Yes, Sylvain, you are correct. When I say A comes before B, it means client will secure the order, actually, B will be sent only after get response of A request. And Yes, A and B are not update same record, so it is not typical Cassandra consistency problem. And Yes, the column name is provide