How to use Secondary Indices 0.7.0beta1

2010-08-17 Thread Thorvaldsson Justus
I figured some but I am stuck, would appreciate help a lot to understand how to use secondary indices. Create a Column family and define the secondary indices " CfDef cdef = new CfDef(); cdef.setColumn_type(columntype); cdef.setComment(comment); cdef.setComparator_type(comparatortype); cdef.s

Re: Cassandra and Pig

2010-08-17 Thread Christian Decker
Ok, by now it's getting very strange. I deleted the entire installation and restarted from scratch and now I'm getting a similar error even though I'm going through the pig_cassandra script. 2010-08-17 15:54:10,049 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduce

Re: Cassandra gem

2010-08-17 Thread Mark
On 8/16/10 11:37 PM, Benjamin Black wrote: I'm testing with the default cassandra.yaml. I cannot reproduce the output in that gist, however: thrift_client = client.instance_variable_get(:@client) => nil Also, the Thrift version for 0.7 is 11.0.0, according to the code I have. Can someone c

data deleted came back after 9 days.

2010-08-17 Thread Zhong Li
Hi All, We have strange issue here. We have 10 nodes cross 5 datacenters. Today I found a strange thing. On one node, few data deleted came back after 8-9 days. The data saved on a node and retrieved/deleted on another node in a remote datacenter. The CF is a super column. What is possi

Re: data deleted came back after 9 days.

2010-08-17 Thread Zhong Li
Cassandra version is 0.6.3 On Aug 17, 2010, at 11:39 AM, Zhong Li wrote: Hi All, We have strange issue here. We have 10 nodes cross 5 datacenters. Today I found a strange thing. On one node, few data deleted came back after 8-9 days. The data saved on a node and retrieved/deleted on anot

Re: data deleted came back after 9 days.

2010-08-17 Thread Peter Schuller
>> We have 10 nodes cross  5 datacenters. Today I found a strange thing. On >> one node, few data deleted came back after 8-9 days. >> >> The data saved on a node and retrieved/deleted on another node in a remote >> datacenter. The CF is a super column. >> >> What is possible causing this? What is

Re: data deleted came back after 9 days.

2010-08-17 Thread Zhong Li
864000 It is default 10 days. I checked all system.log, all nodes are connected, although not all the time, but they reconnected after a few minutes. None of node disconnected more than GC grace seconds. Best, On Aug 17, 2010, at 11:53 AM, Peter Schuller wrote: We have 10 nodes cross 5

TTransportException intermittently in 0.7

2010-08-17 Thread Andres March
We are testing bulk data loads using thrift. About 5% of operations are failing on the following exception. It appears that it is not getting any response (end of file) on the batch mutate response. I'll try to create a test case to demonstrate the behavior. Caused by: org.apache.thrift.tr

Videos of the cassandra summit starting to be posted

2010-08-17 Thread Jeremy Hanna
The videos of the cassandra summit are starting to be posted, just fyi for those who were unable to make it out to SF. http://www.riptano.com/blog/slides-and-videos-cassandra-summit-2010

move data between clusters

2010-08-17 Thread Artie Copeland
what is the best way to move data between clusters. we currently have a 4 node prod cluster with 80G of data and want to move it to a dev env with 3 nodes. we have plenty of disk were looking into nodetool snapshot, but it look like that wont work because of the system tables. sstabletojson does

cache sizes using percentages

2010-08-17 Thread Artie Copeland
if i set a key cache size of 100% the way i understand how that works is: - the cache is not write through, but read through - a key gets added to the cache on the first read if not already available - the size of the cache will always increase for ever item read. so if you have 100mil items your

Re: move data between clusters

2010-08-17 Thread Benjamin Black
without answering your whole question, just fyi: there is a matching json2sstable command for going the other direction. On Tue, Aug 17, 2010 at 10:48 AM, Artie Copeland wrote: > what is the best way to move data between clusters.  we currently have a 4 > node prod cluster with 80G of data and wa

Re: cache sizes using percentages

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 1:55 PM, Artie Copeland wrote: > if i set a key cache size of 100% the way i understand how that works is: > - the cache is not write through, but read through > - a key gets added to the cache on the first read if not already available > - the size of the cache will always

Re: cache sizes using percentages

2010-08-17 Thread Ryan King
On Tue, Aug 17, 2010 at 10:55 AM, Artie Copeland wrote: > if i set a key cache size of 100% the way i understand how that works is: > - the cache is not write through, but read through > - a key gets added to the cache on the first read if not already available > - the size of the cache will alway

Re: indexing rows ordered by int

2010-08-17 Thread S Ahmed
So when using Redis, how do you go about updating the index? Do you serialize changes to the index i.e. when someone votes, you then update the index? Little confused as to how to go about updating a huge index. Say you have 1 million stores, and you want to order by the top votes, how would you

Re: TTransportException intermittently in 0.7

2010-08-17 Thread Jonathan Ellis
are there any errors on your server logs? On Tue, Aug 17, 2010 at 11:46 AM, Andres March wrote: > We are testing bulk data loads using thrift.  About 5% of operations are > failing on the following exception.  It appears that it is not getting any > response (end of file) on the batch mutate resp

Re: move data between clusters

2010-08-17 Thread Jonathan Ellis
you can either use get_range_slices to scan through all your rows and batch_mutate them into the 2nd cluster, or you can start a test cluster with the same number of nodes as the live one and just scp everything over, 1 to 1. it's possible but highly error-prone to manually slice and dice data fil

Re: data deleted came back after 9 days.

2010-08-17 Thread Jonathan Ellis
It doesn't have to be disconnected more than GC grace seconds to cause what you are seeing, it just has to be disconnected at all (thus missing delete commands). Thus you need to be running repair more often than gcgrace, or confident that read repair will handle it for you (which clearly is not t

Re: data deleted came back after 9 days.

2010-08-17 Thread Jeremy Dunck
On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis wrote: > It doesn't have to be disconnected more than GC grace seconds to cause > what you are seeing, it just has to be disconnected at all (thus > missing delete commands). > > Thus you need to be running repair more often than gcgrace, or > confid

Re: data deleted came back after 9 days.

2010-08-17 Thread Ned Wolpert
(gurus, please check my logic here... I'm trying to validate my understanding of this situation.) Isn't the issue that while a server was disconnected, a delete could have occurred, and thus the disconnected server never got the 'tombstone'? (http://wiki.apache.org/cassandra/DistributedDeletes) W

RE: TTransportException intermittently in 0.7

2010-08-17 Thread March, Andres
No errors in server logs. Let me know if you have any debug recommendations. I'm just starting to set it up. - Andres From: Jonathan Ellis [jbel...@gmail.com] Sent: Tuesday, August 17, 2010 12:44 PM To: user@cassandra.apache.org Subject: Re: TTransportEx

Re: indexing rows ordered by int

2010-08-17 Thread Benjamin Black
http://code.google.com/p/redis/wiki/SortedSets On Tue, Aug 17, 2010 at 12:33 PM, S Ahmed wrote: > So when using Redis, how do you go about updating the index? > Do you serialize changes to the index i.e. when someone votes, you then > update the index? > Little confused as to how to go about upda

Errors on CF with index

2010-08-17 Thread Ed Anuff
I'm finding that once I add an index to a column family that I start getting exceptions as I try to add rows to it. It works fine if I don't define the column metadata. Any ideas what would cause this? ERROR 12:44:21,477 Error in ThreadPoolExecutor java.lang.RuntimeException: java.lang.ArrayInde

Re: Errors on CF with index

2010-08-17 Thread Eric Evans
On Tue, 2010-08-17 at 14:04 -0700, Ed Anuff wrote: > > I'm finding that once I add an index to a column family that I start > getting > exceptions as I try to add rows to it. It works fine if I don't > define the > column metadata. Any ideas what would cause this? > > ERROR 12:44:21,477 Error i

Re: Errors on CF with index

2010-08-17 Thread Ed Anuff
Yup, that's it, r986486 on Table.java made the problem go away, talk about great timing :) On Tue, Aug 17, 2010 at 2:38 PM, Eric Evans wrote: > On Tue, 2010-08-17 at 14:04 -0700, Ed Anuff wrote: > > > > I'm finding that once I add an index to a column family that I start > > getting > > exceptio

Map/Reduce over Cassandra

2010-08-17 Thread Bill Hastings
Hi All How performant is M/R on Cassandra when compared to running it on HDFS? Anyone have any numbers they can share? Specifically how much of data the M/R job was run against and what was the throughput etc. Any information would be very helpful. -- Cheers Bill

Re: Cassandra gem

2010-08-17 Thread Benjamin Black
Updated code is now in my master branch, with the reversion to 10.0.0. Please let me know of further trouble. b On Tue, Aug 17, 2010 at 8:31 AM, Mark wrote: >  On 8/16/10 11:37 PM, Benjamin Black wrote: >> >> I'm testing with the default cassandra.yaml. >> >> I cannot reproduce the output in t

Re: data deleted came back after 9 days.

2010-08-17 Thread Zhong Li
Those data were inserted one node, then deleted on a remote node in less than 2 seconds. So it is very possible some node lost tombstone when connection lost. My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back instead of repair? On Aug 17, 2010, at 4:11 PM, Ned Wol

cassandra for a inbox search with high reading qps

2010-08-17 Thread Chen Xinli
Hi, We are going to use cassandra for searching purpose like inbox search. The reading qps is very high, we'd like to use ConsitencyLevel.One for reading and disable read-repair at the same time. For reading consistency in this condition, the writing should use ConsistencyLevel.ALL. But the writi

Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 10:55 PM, Chen Xinli wrote: > Hi, > > We are going to use cassandra for searching purpose like inbox search. > The reading qps is very high, we'd like to use ConsitencyLevel.One for > reading and disable read-repair at the same time. > > For reading consistency in this cond

Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Chen Xinli
I'm using cassandra 0.6.4; there's a configuration option DoConsistencyChecksBoolean in storage-conf.xml. Is't that for read-repair ? I will do a test for WRITE QUORUM, READ.ONE if it can meet our requirements. 2010/8/18 Edward Capriolo > On Tue, Aug 17, 2010 at 10:55 PM, Chen Xinli wrote: > >

Re: Videos of the cassandra summit starting to be posted

2010-08-17 Thread samal gorai
thanks Riptano group for ur support in community education. On Tue, Aug 17, 2010 at 11:15 PM, Jeremy Hanna wrote: > The videos of the cassandra summit are starting to be posted, just fyi for > those who were unable to make it out to SF. > > http://www.riptano.com/blog/slides-and-videos-cassandra-

Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Benjamin Black
On Tue, Aug 17, 2010 at 7:55 PM, Chen Xinli wrote: > Hi, > > We are going to use cassandra for searching purpose like inbox search. > The reading qps is very high, we'd like to use ConsitencyLevel.One for > reading and disable read-repair at the same time. > In 0.7 you can set a probability for r

Re: data deleted came back after 9 days.

2010-08-17 Thread Benjamin Black
On Tue, Aug 17, 2010 at 7:49 PM, Zhong Li wrote: > Those data were inserted one node, then deleted on a remote node in less > than 2 seconds. So it is very possible some node lost tombstone when > connection lost. > My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back > inste

Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Chen Xinli
Thanks for your reply. Cassandra, in our case, is used for searching purposes not for data storage. We can build the cassandra keyspace data daily/weekly when system load is lower. We have modified the cassandra code to add a value filter which makes the data-repair not working. The value filter

Re: Cassandra gem

2010-08-17 Thread Mark
On 8/17/10 5:44 PM, Benjamin Black wrote: Updated code is now in my master branch, with the reversion to 10.0.0. Please let me know of further trouble. b On Tue, Aug 17, 2010 at 8:31 AM, Mark wrote: On 8/16/10 11:37 PM, Benjamin Black wrote: I'm testing with the default cassandra.yaml.