Re: sstable2json and resurrected rows

2012-04-02 Thread Jonas Borgström
On 2012-03-31 08:45 , Zhu Han wrote: Did you hit the bug here? https://issues.apache.org/jira/browse/CASSANDRA-4054 Yes looks like it. But what confuses me most is not the sstable2json bug but why the major compaction does not replace the deleted row data with a tombstone. Is that a bug

Largest 'sensible' value

2012-04-02 Thread Franc Carter
Hi, We are in the early stages of thinking about a project that needs to store data that will be accessed by Hadoop. One of the concerns we have is around the Latency of HDFS as our use case is is not for reading all the data and hence we will need custom RecordReaders etc. I've seen a couple of

Error Replicate on write

2012-04-02 Thread Carlos Juzarte Rolo
Hi, I've been using cassandra for a while, but after a upgrade to 1.0.7, every machine kept running perfectly. Well, except one that constantly throws this error: ERROR [ReplicateOnWriteStage:39] 2012-04-02 12:02:55,131 AbstractCassandraDaemon.java (line 139) Fatal exception in thread

Row iteration using RandomPartitioner

2012-04-02 Thread christopher-t.ng
Hi, Bit of a silly question, is row iteration using the RandomPartitioner deterministic? I don't particularly care what the order is relative to the row keys (obviously there isn't one, it's the RandomPartitioner), but if I run a full iteration over all rows in a CF twice, assuming no underlying

Re: Row iteration using RandomPartitioner

2012-04-02 Thread Jake Luciani
Correct. Random partitioner order is md5 token order. If you make no changes you will get the same order On Apr 2, 2012, at 7:53 AM, christopher-t...@ubs.com wrote: Hi, Bit of a silly question, is row iteration using the RandomPartitioner deterministic? I don't particularly care what

Cassandra CF merkle tree

2012-04-02 Thread Thomas van Neerijnen
Hi all Is there a way I can easily retrieve a Merkle tree for a CF, like the one created during a repair? I didn't see anything about this in the Thrift API docs, I'm assuming this is a data structure made available only to internal Cassandra functions. I would like to explore using the Merkle

Re: Using Thrift

2012-04-02 Thread Dave Brosius
For a thrift client, you need the following jars at a minimum apache-cassandra-clientutil-*.jar apache-cassandra-thrift-*.jar libthrift-*.jar slf4j-api-*.jar slf4j-log4j12-*.jar all of these jars can be found in the cassandra distribution. On 04/02/2012 07:40 AM, Rishabh Agrawal wrote: Any

RE: Using Thrift

2012-04-02 Thread Rishabh Agrawal
I didn't fine slf4j files in distribution. So I downloaded them can you help me how to configure it. From: Dave Brosius [mailto:dbros...@mebigfatguy.com] Sent: Monday, April 02, 2012 6:28 PM To: user@cassandra.apache.org Subject: Re: Using Thrift For a thrift client, you need the following jars

RE: Using Thrift

2012-04-02 Thread Sasha Dolgy
Best to read about maven. Save you some grief. On Apr 2, 2012 3:05 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: I didn’t fine slf4j files in distribution. So I downloaded them can you help me how to configure it. *From:* Dave Brosius [mailto:dbros...@mebigfatguy.com] *Sent:*

Re: Using Thrift

2012-04-02 Thread Dave Brosius
slf4j is just a logging facade, if you actually want log files, you need a logger, say log4j-*.jar in your classpath. Then just configure that with a log4j.properties file. That properties file also needs to be on the classpath. On 04/02/2012 09:05 AM, Rishabh Agrawal wrote: I didn't fine

Re:

2012-04-02 Thread Everton Lima
??? 2012/4/1 juan quintero quinteros8...@gmail.com -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás

Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta2 released

2012-04-02 Thread Sylvain Lebresne
There's an open issue for that: https://issues.apache.org/jira/browse/CASSANDRA-3676 Patch welcome :) -- Sylvain On Sat, Mar 31, 2012 at 8:55 PM, Ben McCann b...@benmccann.com wrote: I'm trying to upgrade Solandra to use 1.1.0-beta2 and think I found a minor issue:

Re: another DataStax OpsCenter question

2012-04-02 Thread Nick Bailey
No. Each agent is responsible for collecting and reporting all the statistics for the node it is installed on, so there shouldn't be any duplication. On Sat, Mar 31, 2012 at 5:01 AM, R. Verlangen ro...@us2.nl wrote: Nick, would that also result in useless duplicates of the statistics?

Compression on client side vs server side

2012-04-02 Thread Ben McCann
Hi, I was curious if I compress my data on the client side with Snappy whether there's any difference between doing that and doing it on the server side? The wiki said that compression works best where each row has the same columns. Does this mean the compression will be more efficient on the

Re: really bad select performance

2012-04-02 Thread David Leimbach
This is all very hypothetical, but I've been bitten by this before. Does row_loaded happen to be a binary or boolean value? If so the secondary index generated by Cassandra will have at most 2 rows, and they'll be REALLY wide if you have a lot of entries. Since Cassandra doesn't distribute

RE: Compression on client side vs server side

2012-04-02 Thread Jeremiah Jordan
The server side compression can compress across columns/rows so it will most likely be more efficient. Whether you are CPU bound or IO bound depends on your application and node setup. Unless your working set fits in memory you will be IO bound, and in that case server side compression helps

Re: Compression on client side vs server side

2012-04-02 Thread Martin Junghanns
Hi, how do you select between client- and serverside compression? i'm using hector and i set compression when creating a cf, so the compression executes when inserting the data on the server oO greetings, martin Am 02.04.2012 17:42, schrieb Ben McCann: Hi, I was curious if I compress my

Re: Compression on client side vs server side

2012-04-02 Thread Ben McCann
Thanks Jeremiah, that's what I has suspected. I appreciate the confirmation. Martin, there's not built-in support for doing compression client side, but it'd be easy for me to do manually since I just have one column with all my serialized data, which is why I was considering it. On Mon, Apr

Nodetool snapshot, consistency and replication

2012-04-02 Thread R. Verlangen
Hi there, I have a question about the nodetool snapshot. Situation: - 3 node cluster - RF = 3 - fully consistent (not measured, but let's say it is) Is it true that when I take a snaphot at only one of the 3 nodes this contains all the data in the cluster (at least 1 replica)? With kind

column’s timestamp

2012-04-02 Thread Avi-h
Is it possible to fetch a column based on the row key and the column’s timestamp only (not using the column’s name)? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/column-s-timestamp-tp7429905p7429905.html Sent from the

Re: column’s timestamp

2012-04-02 Thread Tyler Hobbs
On Mon, Apr 2, 2012 at 11:24 AM, Avi-h avih...@gmail.com wrote: Is it possible to fetch a column based on the row key and the column’s timestamp only (not using the column’s name)? No, but most clients support including the timestamp in the result set, so you can filter the columns by

Re: column’s timestamp

2012-04-02 Thread Pierre Chalamet
Hi, What about using a ts as column name and do a get sliced instead ? --Original Message-- From: Avi-h To: cassandra-u...@incubator.apache.org ReplyTo: user@cassandra.apache.org Subject: column’s timestamp Sent: Apr 2, 2012 18:24 Is it possible to fetch a column based on the row key

Re: multi region EC2

2012-04-02 Thread Rob Coli
On Mon, Mar 26, 2012 at 3:31 PM, Deno Vichas but what if i already have a bunch (8g per node) data that i need and i don't have a way to re-create it. Note that the below may have unintended consequences if using Counter column families. It actually can be done with the cluster running, below

Re: Nodetool snapshot, consistency and replication

2012-04-02 Thread Rob Coli
On Mon, Apr 2, 2012 at 9:19 AM, R. Verlangen ro...@us2.nl wrote: - 3 node cluster - RF = 3 - fully consistent (not measured, but let's say it is) Is it true that when I take a snaphot at only one of the 3 nodes this contains all the data in the cluster (at least 1 replica)? Yes. =Rob --

Re: Largest 'sensible' value

2012-04-02 Thread Ben Coverston
This is a difficult question to answer for a variety of reasons, but I'll give it a try, maybe it will be helpful, maybe not. The most obvious problem with this is that Thrift is buffer based, not streaming. That means that whatever the size of your chunk it needs to be received, deserialized,

Re: Cassandra - crash with “free() invalid pointer”

2012-04-02 Thread Vijay
Can you send us the stack trace which you can find in the hs_err_pid*.log? is the system memory all used up (free)? any errors in the logs just before the crash? Regards, /VJ On Mon, Mar 26, 2012 at 12:35 AM, Maciej Miklas mac.mik...@googlemail.comwrote: I have row cache - it's about 20GB

Re: data size difference between supercolumn and regular column

2012-04-02 Thread Yiming Sun
Yup Jeremiah, I learned a hard lesson on how cassandra behaves when it runs out of disk space :-S.I didn't try the compression, but when it ran out of disk space, or near running out, compaction would fail because it needs space to create some tmp data files. I shall get a tatoo that says

Re: Error Replicate on write

2012-04-02 Thread aaron morton
Is JNA.jar in the path ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/04/2012, at 10:11 PM, Carlos Juzarte Rolo wrote: Hi, I've been using cassandra for a while, but after a upgrade to 1.0.7, every machine kept running

Re: Using Thrift

2012-04-02 Thread aaron morton
I would recommend starting with a higher level client like Hector or Astyanax http://wiki.apache.org/cassandra/ClientOptions They have *a lot* of features and will make it easier to focus on learning how to use Cassandra. Then when you know what you like or do not like about the existing

Re: Cassandra CF merkle tree

2012-04-02 Thread aaron morton
No it's internal only. Take a look at o.a.c.service.AntiEntropyService Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/04/2012, at 12:21 AM, Thomas van Neerijnen wrote: Hi all Is there a way I can easily retrieve a Merkle tree for

Re: Largest 'sensible' value

2012-04-02 Thread Franc Carter
On Tue, Apr 3, 2012 at 4:18 AM, Ben Coverston ben.covers...@datastax.comwrote: This is a difficult question to answer for a variety of reasons, but I'll give it a try, maybe it will be helpful, maybe not. The most obvious problem with this is that Thrift is buffer based, not streaming. That

Re: Using Thrift

2012-04-02 Thread Hari Prasad Siripuram
I faced the same issue: You can find the similar issue here. http://stackoverflow.com/questions/8370365/debugging-bizarre-spring-slf4j-jar-issue Also, Spring community is acknowledging on the SLF4J Issue here (commons-logging issue):

Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta2 released

2012-04-02 Thread Ben McCann
Cool. Thanks. That should be easy enough to fix :-) On Mon, Apr 2, 2012 at 8:05 AM, Sylvain Lebresne sylv...@datastax.comwrote: There's an open issue for that: https://issues.apache.org/jira/browse/CASSANDRA-3676 Patch welcome :) -- Sylvain On Sat, Mar 31, 2012 at 8:55 PM, Ben McCann

key cache size calculation

2012-04-02 Thread Shoaib Mir
Hi guys, We are calculating key cache size right now. There is this column family with ~ 100 million columns and right now we have the cache size set at 2 million. I suspect that the active data we got is not all fitting in the 2 million cache size and we at times are getting query execution

Re: data size difference between supercolumn and regular column

2012-04-02 Thread aaron morton
If you have a workload with overwrites you will end up with some data needing compaction. Running a nightly manual compaction would remove this, but it will also soak up some IO so it may not be the best solution. I do not know if Leveled compaction would result in a smaller disk load for the

Re: key cache size calculation

2012-04-02 Thread aaron morton
Take a look at the key cache hit rate in nodetool cfstats. One approach is to increase the cache size until you do not see a matching increase in the hit rate. Is there a limit to key cache size? I know that is all taken from heap but how much max we can go with setting the key cache

Re: key cache size calculation

2012-04-02 Thread Shoaib Mir
On Tue, Apr 3, 2012 at 11:49 AM, aaron morton aa...@thelastpickle.comwrote: Take a look at the key cache hit rate in nodetool cfstats. One approach is to increase the cache size until you do not see a matching increase in the hit rate. Thanks Aaron, what do you think will be the ideal