Comments inline. On Tue, Mar 24, 2009 at 6:53 PM, Jun Rao <[email protected]> wrote:
> > Prashant, > > Thanks for the comments. They are quite useful. Let me try to address some > of the points that you made. > > 1. It is true that in our current implementation, we can glue the changes > on both the data and the index in one batch_update() call. This way, the > data and the index will be maintained synchronously. However, maintaining > the index on the server is likely more efficient since there is less > communication overhead. You seem to agree with this. [Avinash] You can update multiple column families for a single key in one mutation. > > > 2. Cassandra currently doesn't acquire row-lock for row accesses. However, > the implication is that a reader may see partial updates of a row. For > example, suppose that a writer updates two columns in different CFs. Then, > it is possible for a concurrent reader to see the update on one column, but > not the other one. For some applications, row-level consistency could be > important. It's probably for this reason, in HBase, a region server > acquires a row lock for every read and write. [Avinash] Updates to a single row within a machine are atomic. Which means what you are stating will not happen. Writes and reads will be serialized at the Memtable. > > > 3. For our current application, the size of all entities in a group is not > too large and likely fits within the capacity of a single node. However, > for other applications, being able to scale a group to more than a node > could be useful. Storing a group within a single row will prevent scaling > out the group. [Avinash] I guess the question is how many entities do you envision in a group. What do you mean by fitting into one node? > > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > [email protected] > > > Prashant Malik <[email protected]> wrote on 03/24/2009 11:34:51 AM: > > > Some questions Iline > > > > On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao <[email protected]> > wrote: > > > > > > > > > > > We have an application that has groups and entities. A group has many > > > entities and an entity has a bunch of (attribute, value) pairs. A > common > > > access pattern is to select some number of entities within a group with > > > attribute X equals to x and ordered by attribute Y. For efficiency, we > want > > > to build a secondary index for each group and collocate a group and its > > > secondary index on the same node. Our current approach is to map a > group to > > > a row in Cassandra and each entity to a column in a column family (CF). > > > Within the same row, we use a separate CF (ordered by name) to > implement a > > > secondary index, say on attribute X and Y. In this family, each column > name > > > has the form of X:x:Y:y:entityID. We extended the get_slice() function > so > > > that it can get a slice of columns starting from a given column. The > > > extended function uses the column-level index to locate the starting > column > > > quickly. (We'd be happy to contribute this extension back to Cassandra > if > > > people find this useful). Using the extended get_slice(), we were able > to > > > access the entities through the simulated secondary index. > > > > > > We see a couple of problems with the current approach. First, our > > > application has to maintain the index. This is inefficient and could > leave > > > the index inconsistent when failure occurs. Second, mapping each > entity to > > > a column may not be a good idea. Often, there is some sort of locking > for > > > each row access. Putting many entities per row limits concurrency. > Today, > > > in Cassandra, a full row is deserialized into memory during compaction. > > > This limits the number of entities that can be put in a single row. > Also, > > > intuitively, an entity is more naturally represented as a row with > > > attributes stored as columns. > > > > > > Prashant > > > > 1. Application can send the index and the entityId update in the same > > write , a write per row is always atomic given that teh index and the > data > > have teh same key in the above > > case the index will not be out of sync. > > 2. The maintainance of index by app can be moved into cassandra and I > > agree with fact that you can add support for it by a built in special CF > > which you have to do in either of the approaches you are taking. > > Infact in the first approach that you are taking it will be > easier > > to move the indexes in case of adding nodes to the cluster and when files > > are split and data is sent over. In the second approach this > > process could get comlicated. > > 3. There is no locking for row access in cassandra. > > 4. Compactions can be opmtimized for name sorted columns , this is > one > > of the workitems we have where we do not deserialize the entire row in > > compactiopn but only do it in slices , this is easily achievable for > > name sorted columns. > > 5. The entity can still be represented naturally as a supercolumn > where > > each of the super column name is the entity Id and each of the columns in > > the supercolumn are attribute value pairs. > > 6. How many entities per groupid are we talking about here ? why do > > you feel concurrency is limited by entities per row ? > > > > > > > > > > To address the above problems, we are thinking of the following new > > > implementation. Each entity is mapped to a row in Cassandra and uses a > > > two-part key (groupID, entityID). We use the groupID to hash an entity > to a > > > node. This way, all entities for a group will be collocated in the same > > > node. We then define a special CF to serve as the secondary index. In > the > > > definition, we specify what entity attributes need to be indexed and > in > > > what order. Within a node, this special CF will index all rows stored > > > locally. Every time we insert a new entity, the server automatically > > > extracts the index key based on the index definition (for example, the > > > index key can be of the form "hash > (rowkey):attribute1:attribute2:rowkey) > > > and add the index entry to the special CF. We can then access the > entities > > > using an extended version of the query language in Cassandra. For > example, > > > if we issue the following query and there is an index defined by > > > (attributeX, attributeY), the query can be evaluated using the index in > the > > > special CF. (Note that AppEngine supports this flavor of queries.) > > > > > > select attributeZ > > > from ROWS(HASH = hash(groupID)) > > > where attributeX="x" > > > order by attributeY desc > > > limit 50 > > > > > > We are in the middle of prototyping this approach. We'd like to hear if > > > other people are interested in this too or if people think there are > better > > > alternatives. > > > > > > > > > Prashant > > > > 1. In this approach I see a number of problems during addition , > removal > > and moving of new nodes. These problems are not impossible to solve but > > just harder and inefficient with the above approach. > > 2. What are the wins of this approach over the other is not clear to > me. > > could you please highlight those. > > > > > > > > > > Jun > > > IBM Almaden Research Center > > > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > > > > > [email protected] >
