I think Prashant brought up some very good points. The response would be very helpful to understand the best way to do this. Avinash
On Tue, Mar 24, 2009 at 6:33 PM, Jun Rao <[email protected]> wrote: > Jonathan, > > Thanks for the comments. > > I agree with your first point. It will be useful to plug in a user-defined > index analyzer. The analyzer takes a row with the indexed columns and can > extract whatever index keys that it likes. This way, an application can > choose what to index for different data types. > > As for queries vs. low-level api, we can make both available to the > application developer. In general, what can be done in a single query may > have to be translated to multiple low-level api calls. Some apps may prefer > the former for efficiency. > > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > [email protected] > > [image: Inactive hide details for Jonathan Ellis <[email protected]>] > Jonathan Ellis <[email protected]> > > > > *Jonathan Ellis <[email protected]>* > > 03/24/2009 10:48 AM > Please respond to > [email protected] > > > > To > > [email protected] > cc > > > Subject > > Re: secondary index support in Cassandra > > > This adds a lot of complexity but I definitely see people wanting easy > indexing out of the box. So +1 in principle. > > A few high-level comments: > > First, for maximum flexibility, you probably want to allow indexes to > be defined in code. That is, you'd define something like > > <ColumnFamily name="foo"> > <Index generator="com.ibm.cassandra.indexGenerator"/> > </ColumnFamily> > > and allow index generators to be loaded at runtime. Nobody else is > going to need the specific case of > hash(rowkey):attribute1:attribute2:rowkey so abstract that out and > make it pluggable for whatever weird-ass requirements people have. > > Second, I'm not a fan of queries by parsing strings. The whole rdbms > world has been moving _away_ from SQL and towards OO interfaces for > the last 10 years. I like the thrift API for this reason. (It is a > little clunky in Java, but _everything_ is a little clunky in Java. > Much better in Python/Ruby/etc.) > > Finally, as an implementation detail, Cassandra already does too much > in-memory when writing and merging sstables. Don't make it worse. :) > > -Jonathan > > P.S. the partitioner abstraction layer in CASSANDRA-3 will allow you > to do the per-node grouping you want without weird contortions. > > On Tue, Mar 24, 2009 at 11:21 AM, Jun Rao <[email protected]> wrote: > > To address the above problems, we are thinking of the following new > > implementation. Each entity is mapped to a row in Cassandra and uses a > > two-part key (groupID, entityID). We use the groupID to hash an entity to > a > > node. This way, all entities for a group will be collocated in the same > > node. We then define a special CF to serve as the secondary index. In the > > definition, we specify what entity attributes need to be indexed and in > > what order. Within a node, this special CF will index all rows stored > > locally. Every time we insert a new entity, the server automatically > > extracts the index key based on the index definition (for example, the > > index key can be of the form "hash(rowkey):attribute1:attribute2:rowkey) > > and add the index entry to the special CF. We can then access the > entities > > using an extended version of the query language in Cassandra. For > example, > > if we issue the following query and there is an index defined by > > (attributeX, attributeY), the query can be evaluated using the index in > the > > special CF. (Note that AppEngine supports this flavor of queries.) > > > > select attributeZ > > from ROWS(HASH = hash(groupID)) > > where attributeX="x" > > order by attributeY desc > > limit 50 > > > > We are in the middle of prototyping this approach. We'd like to hear if > > other people are interested in this too or if people think there are > better > > alternatives. > > > > Jun > > IBM Almaden Research Center > > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > > > [email protected] > >
