Re: indexes from CassandraSF

2011-11-13 Thread Ed Anuff
/2011 02:41, Ed Anuff wrote: 1) The index updates should be eventually consistent.  This does mean that you can get a transient false-positive on your search results. If this doesn't work for you, then you either need to use ZK or some other locking solution or do read repair by making sure

Re: indexes from CassandraSF

2011-11-12 Thread Ed Anuff
1) The index updates should be eventually consistent. This does mean that you can get a transient false-positive on your search results. If this doesn't work for you, then you either need to use ZK or some other locking solution or do read repair by making sure that the row you retrieve contains

Re: Second Cassandra users survey

2011-11-07 Thread Ed Anuff
This is basically what entity groups are about - https://issues.apache.org/jira/browse/CASSANDRA-1684 On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote: This feature interests me, so I thought I'd add some comments. Having used partition features in existing databases like DB2,

Re: Second Cassandra users survey

2011-11-06 Thread Ed Anuff
On Sun, Nov 6, 2011 at 12:52 AM, Radim Kolar h...@sendmail.cz wrote: - support for atomic operations or batches (if QUORUM fails, data should not be visible with ONE) zookeeper is solving that. I'd like to see official support for Zookeeper inside of Cassandra. I'd like it to be something that

Re: CompositeType for use with 0.7

2011-11-05 Thread Ed Anuff
It was developed for 0.7x but we then made a few changes so that it would work with 0.8-rc1 that broke the 0.7 compatibility. The idea at the time was to do a 0.7 branch for it but it looks like that never got checked in. If you roll back to the previous commit it should give you a version that

data model for unique users in a time period

2011-10-31 Thread Ed Anuff
I'm looking at the scenario of how to keep track of the number of unique visitors within a given time period. Inserting user ids into a wide row would allow me to have a list of every user within the time period that the row represented. My experience in the past was that using get_count on a

Re: data model for unique users in a time period

2011-10-31 Thread Ed Anuff
to make each query for the count a bit faster. Depending on how often this query would be hit, I would still recommend caching, but you could calculate reality a little more often. Zach On Mon, Oct 31, 2011 at 12:22 PM, Ed Anuff e...@anuff.com wrote: I'm looking at the scenario of how

[ANN] Usergrid, Open Source Mobile Data Platform built on Cassandra

2011-10-03 Thread Ed Anuff
I made mention of this during my presentation at the Cassandra Summit back in July, but we're finally ready to release the source for Usergrid. This is a mobile platform stack built on top of Cassandra and using Hector and we're making the full source code available on GitHub. We'll be offering

Re: Customized Secondary Index Schema

2011-08-25 Thread Ed Anuff
/8/24 Ryan King r...@twitter.com On Tue, Aug 23, 2011 at 10:03 AM, Alvin UW alvi...@gmail.com wrote: Hello, As mentioned by Ed Anuff in his blog and slides, one way to build customized secondary index is: We use one CF, each row to represent a secondary index, with the secondary index

Re: Customized Secondary Index Schema

2011-08-25 Thread Ed Anuff
name. This will ensure that your index is evenly distributed throughout your cluster. - Original Message - From: Ed Anuff e...@anuff.com To: user@cassandra.apache.org Sent: Thursday, August 25, 2011 12:48:49 PM Subject: Re: Customized Secondary Index Schema How many unique last names

Re: HOW TO select a column or all columns that start with X

2011-08-03 Thread Ed Anuff
I believe you can set start to be ABC_ and finish to be ABC_\ (for UTF8) to get everything that contains exactly ABC_ and set finish to ABC_\ to get everything that starts with ABC_. You probably want to do a simple string comparison test to verify. On Tue, Aug 2, 2011 at 6:50 PM, Tyler

Re: Planet Cassandra (an aggregation site for Cassandra News)

2011-08-03 Thread Ed Anuff
Awesome, great news! On Wed, Aug 3, 2011 at 11:53 AM, Lynn Bender line...@gmail.com wrote: Greetings all, I just wanted to send a note out to let everyone know about Planet Cassandra -- an aggregation site for Cassandra news and blogs. Andrew Llavore from DataStax and I built the site. We

Re: hector-jpa

2011-06-06 Thread Ed Anuff
That's a work in progress and actually represents the next generation of JPA in Hector. There is a more lightweight version present in the release version of Hector called Hector Object Mapper. I'm sure Nate or Todd who've worked more on hector-jpa can elaborate. Ed On Mon, Jun 6, 2011 at 2:58

Re: hector-jpa

2011-06-06 Thread Ed Anuff
now? The data nucleus plugin? I don't need the query parts or anything, I just don't want to do have to translate columns to java fields and vice versa On Mon, Jun 6, 2011 at 6:25 PM, Ed Anuff e...@anuff.com wrote: That's a work in progress and actually represents the next generation of JPA

Re: Ant error in Eclipse when building Cassandra

2011-05-07 Thread Ed Anuff
, Jonathan Ellis jbel...@gmail.com wrote: Default stack is huge, so maven-ant-tasks-retrieve-build is probably recursing infinitely somewhere :( On Fri, May 6, 2011 at 2:42 PM, Ed Anuff e...@anuff.com wrote: I finally got around to getting Eclipse set up to build Cassandra following the directions

Ant error in Eclipse when building Cassandra

2011-05-06 Thread Ed Anuff
I finally got around to getting Eclipse set up to build Cassandra following the directions on the wiki and it seems to be working, Eclipse isn't showing any errors except that when it fires off the automatic ant build I get the following error: maven-ant-tasks-retrieve-build: BUILD FAILED

Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Ed Anuff
Sounds like the problem might be on the hector side. Lots of hector users on this list, but usually not a bad idea to ask on hector-us...@googlegroups.com (cc'd). The jetty servers stopping responding is a bit vague, somewhere in your logs is an error message that should shed some light on where

Re: ballpark low cardinality range for secondary indexes

2011-04-08 Thread Ed Anuff
If you're just indexing on a single column value and the values have low cardinality in, say, the 10's - I'd have a wide row for each cardinal value that contained the set of keys for rows that contained that value. For higher levels of cardinality or if you're indexing on multiple columns, there

Re: Problem with UUID

2011-04-08 Thread Ed Anuff
Hmm, if you're really doing this, you're not getting a time uuid: UUID timeUUID = getTimeUUID().randomUUID(); That call to randomUUID() is invoking the static randomUUID() method in java.util.UUID which is generating a non-time random uuid. I'm not sure why you're getting that error message

Re: ballpark low cardinality range for secondary indexes

2011-04-08 Thread Ed Anuff
for another), multiple columns need to be indexed, needs sorted order. Hope that amazon paper has some good tips on solving the transactional gotcha :-) -Adi On Fri, Apr 8, 2011 at 3:49 PM, Ed Anuff e...@anuff.com wrote: If you're just indexing on a single column value and the values have

Re: Problem with UUID

2011-04-08 Thread Ed Anuff
Силка sylkaa...@gmail.com: Then how i can generate correct time UUID key in java ? 8 квітня 2011 р. 22:58 Ed Anuff e...@anuff.com написав: Hmm, if you're really doing this, you're not getting a time uuid:  UUID timeUUID = getTimeUUID().randomUUID(); That call to randomUUID() is invoking

Re: Problem with UUID

2011-04-08 Thread Ed Anuff
UUID timeUUID = getTimeUUID(); doesn't solve my problem. 9 квітня 2011 р. 01:16 Ed Anuff e...@anuff.com написав: Oops, I should have been more clear.  You have this code: UUID timeUUID = getTimeUUID().randomUUID(); what you need is this code: UUID timeUUID = getTimeUUID(); What I meant

Re: Ditching Cassandra

2011-03-30 Thread Ed Anuff
My concern when I see something like this is it might cause developers on the project to get worried and start to try to solve the wrong problems. Cassandra is not going to be as easy as Mongo, certainly not any time soon. CQL won't do it, although it will help. This isn't a criticism of

Re: Any way to get different unique time UUIDs for the same time value?

2011-03-30 Thread Ed Anuff
If I understand the question, it's not that UUIDGen.makeType1UUIDFromHost(InetAddress.getLocalHost()) is returning duplicate UUID's. It should always be giving unique time-based uuids and has checks to make sure it does. The question was whether it was possible to get multiple unique time-based

Re: Any way to get different unique time UUIDs for the same time value?

2011-03-30 Thread Ed Anuff
the same millisecond, then the ordering is not preserved) - Drew On Mar 30, 2011, at 4:13 PM, Ed Anuff wrote: If I understand the question, it's not that UUIDGen.makeType1UUIDFromHost(InetAddress.getLocalHost()) is returning duplicate UUID's.  It should always be giving unique time-based uuids

Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ed Anuff
It's nice to see some testing in this regard, however, it's worth pointing out something that gets lost in CF index vs secondary index discussions. What you're really proving is that get_slice (across columns) is faster than get_indexed_slices (across keys). For up to a certain size (and it would

Re: Homebrew CF-indexing vs secondary indexing

2011-02-25 Thread Ed Anuff
that we should design data model such that row keys actually become columns (and create secondary index) so that the data retrieval is faster. I am soon setting up big test instances to test all this. On Fri, Feb 25, 2011 at 11:18 AM, Ed Anuff e...@anuff.com wrote: It's nice to see some testing

Re: Understanding Indexes

2011-02-24 Thread Ed Anuff
If you mean does it make sense to have a CF where each row contains a set of keys to other rows in another CF, then yes, that's a common design pattern, although usually it's because you're creating collections of those rows (i.e. a Groups CF where each row consists of a set of keys to rows in the

Re: Understanding Indexes

2011-02-24 Thread Ed Anuff
It all depends on what you're trying to do. What you're proposing doing, by defintion, is creating a secondary index. The primary index is your row key. Depending on the partitioner, it might or might not be a conveniently iterable index or sorted index. If you need your keys sorted in a

Latest Hector release (0.7.0-26) includes experimental virtual keyspace support

2011-02-11 Thread Ed Anuff
The latest version of the Hector Java client has experimental support for a virtual keyspaces feature that transparently adds and removes a prefix to all row keys sent between Hector and Cassandra. There's a small write up of it here: https://github.com/rantav/hector/wiki/Virtual-Keyspaces The

Re: Multi-tenancy, and authentication and authorization

2011-01-18 Thread Ed Anuff
Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single

Re: Multi-tenancy, and authentication and authorization

2011-01-18 Thread Ed Anuff
it and was helping maintain it before getting pulled off onto other projects). On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote: Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean

Re: Is there any way to store muti-version data based on the timestamp?

2010-12-01 Thread Ed Anuff
If you go this route, be sure to take a look at the custom column comparator I wrote to make this sort of thing easier: https://github.com/edanuff/CassandraCompositeType On Wed, Dec 1, 2010 at 4:56 AM, Daniel Lundin d...@eintr.org wrote: You could also use a standard column family, composing

Re: Achieving isolation on single row modifications with batch_mutate

2010-11-30 Thread Ed Anuff
It's hard to tell without knowing the the nature of the data you're writing, but you might want to think about whether you can embed any sort of version number and/or checksum into the column names of the chunk columns. That way, you could very easily determine that the data you wanted to

Internal error processing get_indexed_slices?

2010-08-27 Thread Ed Anuff
Seeing this error on the latest build with code that worked fine previously. Any ideas? 2010-08-27 17:24:45,037 ERROR (pool-1-thread-2) [org.apache.cassandra.thrift.Cassandra$Processor] - Internal error processing get_indexed_slices java.lang.NoSuchMethodError:

Re: Internal error processing get_indexed_slices?

2010-08-27 Thread Ed Anuff
Never mind, did an ant clean and then rebuilt and it looks fine now. Ed On Fri, Aug 27, 2010 at 5:30 PM, Ed Anuff e...@anuff.com wrote: Seeing this error on the latest build with code that worked fine previously. Any ideas? 2010-08-27 17:24:45,037 ERROR (pool-1-thread-2

Errors on CF with index

2010-08-17 Thread Ed Anuff
I'm finding that once I add an index to a column family that I start getting exceptions as I try to add rows to it. It works fine if I don't define the column metadata. Any ideas what would cause this? ERROR 12:44:21,477 Error in ThreadPoolExecutor java.lang.RuntimeException:

Re: Errors on CF with index

2010-08-17 Thread Ed Anuff
Yup, that's it, r986486 on Table.java made the problem go away, talk about great timing :) On Tue, Aug 17, 2010 at 2:38 PM, Eric Evans eev...@rackspace.com wrote: On Tue, 2010-08-17 at 14:04 -0700, Ed Anuff wrote: I'm finding that once I add an index to a column family that I start

Re: Data Modeling Conundrum

2010-05-08 Thread Ed Anuff
:42 PM, Ed Anuff wrote: Sorry, missed that. I'm not sure if there's a cleaner way than using the approaches you've looked at, hopefully someone else has an answer. How big is N and do you need to keep more than N around? On Sat, May 8, 2010 at 10:26 AM, William Ashley wash...@gmail.com wrote

Re: Is SuperColumn necessary?

2010-05-05 Thread Ed Anuff
, Ed Anuff e...@anuff.com wrote: It might make sense to create a CompositeType subclass of AbstractType for the purpose of constructing and comparing these types of composite column names so that if you could more easily do that sort of thing rather than having to concatenate into one big string

Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Ed Anuff
Assuming a ColumnFamily with a CompareWith of TimeUUIDType, is it possible to call get_slice with an arbitrary date range? How would valid values for the start and finish attributes of the slice range be constructed? Thanks Ed

Re: Querying by date range when using TimeUUIDType ColumnFamily?

2010-04-27 Thread Ed Anuff
Yes, Lucas was correct about the nature of my original question. I'm glad to hear that Justin's solution works, it makes for a much simpler schema. Ed On Tue, Apr 27, 2010 at 3:06 PM, Lucas Di Pentima lu...@di-pentima.com.arwrote: El 27/04/2010, a las 18:23, Lee Parker escribió: I have