Prashant had many other points out there whereby you don't need your second approach. I guess that is what I was referring to. Avinash
On Tue, Mar 24, 2009 at 8:19 PM, Jun Rao <[email protected]> wrote: > Prashant, > > I forgot about another point you mentioned. > > In the new approach, carving out a chunk of data by hash (needed for node > removal/addition) may not be efficient. In the worse case, we have to make a > full scan of the data. It is possible to make it more efficient by following > the strategy that you guys implemented when using the random hash function: > prefixing each index key with the hash value. On the other hand, I am > wondering if we really need to worry about the performance on > adding/removing nodes. These are infrequent events and are also > non-blocking. > > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > [email protected] > > [image: Inactive hide details for Prashant Malik <[email protected]>] > Prashant Malik <[email protected]> > > > > *Prashant Malik <[email protected]>* > > 03/24/2009 11:34 AM > Please respond to > [email protected] > > > > To > > [email protected] > cc > > > Subject > > Re: secondary index support in Cassandra > Some questions Iline > > On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao <[email protected]> wrote: > > > > > > > We have an application that has groups and entities. A group has many > > entities and an entity has a bunch of (attribute, value) pairs. A common > > access pattern is to select some number of entities within a group with > > attribute X equals to x and ordered by attribute Y. For efficiency, we > want > > to build a secondary index for each group and collocate a group and its > > secondary index on the same node. Our current approach is to map a group > to > > a row in Cassandra and each entity to a column in a column family (CF). > > Within the same row, we use a separate CF (ordered by name) to implement > a > > secondary index, say on attribute X and Y. In this family, each column > name > > has the form of X:x:Y:y:entityID. We extended the get_slice() function so > > that it can get a slice of columns starting from a given column. The > > extended function uses the column-level index to locate the starting > column > > quickly. (We'd be happy to contribute this extension back to Cassandra if > > people find this useful). Using the extended get_slice(), we were able to > > access the entities through the simulated secondary index. > > > > We see a couple of problems with the current approach. First, our > > application has to maintain the index. This is inefficient and could > leave > > the index inconsistent when failure occurs. Second, mapping each entity > to > > a column may not be a good idea. Often, there is some sort of locking for > > each row access. Putting many entities per row limits concurrency. Today, > > in Cassandra, a full row is deserialized into memory during compaction. > > This limits the number of entities that can be put in a single row. Also, > > intuitively, an entity is more naturally represented as a row with > > attributes stored as columns. > > > Prashant > > 1. Application can send the index and the entityId update in the same > write , a write per row is always atomic given that teh index and the data > have teh same key in the above > case the index will not be out of sync. > 2. The maintainance of index by app can be moved into cassandra and I > agree with fact that you can add support for it by a built in special CF > which you have to do in either of the approaches you are taking. > Infact in the first approach that you are taking it will be easier > to move the indexes in case of adding nodes to the cluster and when files > are split and data is sent over. In the second approach this > process could get comlicated. > 3. There is no locking for row access in cassandra. > 4. Compactions can be opmtimized for name sorted columns , this is one > of the workitems we have where we do not deserialize the entire row in > compactiopn but only do it in slices , this is easily achievable for > name sorted columns. > 5. The entity can still be represented naturally as a supercolumn where > each of the super column name is the entity Id and each of the columns in > the supercolumn are attribute value pairs. > 6. How many entities per groupid are we talking about here ? why do > you feel concurrency is limited by entities per row ? > > > > > > To address the above problems, we are thinking of the following new > > implementation. Each entity is mapped to a row in Cassandra and uses a > > two-part key (groupID, entityID). We use the groupID to hash an entity to > a > > node. This way, all entities for a group will be collocated in the same > > node. We then define a special CF to serve as the secondary index. In the > > definition, we specify what entity attributes need to be indexed and in > > what order. Within a node, this special CF will index all rows stored > > locally. Every time we insert a new entity, the server automatically > > extracts the index key based on the index definition (for example, the > > index key can be of the form "hash(rowkey):attribute1:attribute2:rowkey) > > and add the index entry to the special CF. We can then access the > entities > > using an extended version of the query language in Cassandra. For > example, > > if we issue the following query and there is an index defined by > > (attributeX, attributeY), the query can be evaluated using the index in > the > > special CF. (Note that AppEngine supports this flavor of queries.) > > > > select attributeZ > > from ROWS(HASH = hash(groupID)) > > where attributeX="x" > > order by attributeY desc > > limit 50 > > > > We are in the middle of prototyping this approach. We'd like to hear if > > other people are interested in this too or if people think there are > better > > alternatives. > > > > > Prashant > > 1. In this approach I see a number of problems during addition , removal > and moving of new nodes. These problems are not impossible to solve but > just harder and inefficient with the above approach. > 2. What are the wins of this approach over the other is not clear to me. > could you please highlight those. > > > > > > Jun > > IBM Almaden Research Center > > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > > > [email protected] > >
