Prashant,

Thanks for the comments. They are quite useful. Let me try to address some
of the points that you made.

1. It is true that in our current implementation, we can glue the changes
on both the data and the index in one batch_update() call. This way, the
data and the index will be maintained synchronously. However, maintaining
the index on the server is likely more efficient since there is less
communication overhead. You seem to agree with this.

2. Cassandra currently doesn't acquire row-lock for row accesses. However,
the implication is that a reader may see partial updates of a row. For
example, suppose that a writer updates two columns in different CFs. Then,
it is possible for a concurrent reader to see the update on one column, but
not the other one. For some applications, row-level consistency could be
important. It's probably for this reason, in HBase, a region server
acquires a row lock for every read and write.

3. For our current application, the size of all entities in a group is not
too large and likely fits within the capacity of a single node. However,
for other applications, being able to scale a group to more than a node
could be useful. Storing a group within a single row will prevent scaling
out the group.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

[email protected]


Prashant Malik <[email protected]> wrote on 03/24/2009 11:34:51 AM:

> Some questions Iline
>
> On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao <[email protected]> wrote:
>
> >
> >
> > We have an application that has groups and entities. A group has many
> > entities and an entity has a bunch of (attribute, value) pairs. A
common
> > access pattern is to select some number of entities within a group with
> > attribute X equals to x and ordered by attribute Y. For efficiency, we
want
> > to build a secondary index for each group and collocate a group and its
> > secondary index on the same node. Our current approach is to map a
group to
> > a row in Cassandra and each entity to a column in a column family (CF).
> > Within the same row, we use a separate CF (ordered by name) to
implement a
> > secondary index, say on attribute X and Y. In this family, each column
name
> > has the form of X:x:Y:y:entityID. We extended the get_slice() function
so
> > that it can get a slice of columns starting from a given column. The
> > extended function uses the column-level index to locate the starting
column
> > quickly. (We'd be happy to contribute this extension back to Cassandra
if
> > people find this useful). Using the extended get_slice(), we were able
to
> > access the entities through the simulated secondary index.
> >
> > We see a couple of problems with the current approach. First, our
> > application has to maintain the index. This is inefficient and could
leave
> > the index inconsistent when failure occurs.  Second, mapping each
entity to
> > a column may not be a good idea. Often, there is some sort of locking
for
> > each row access. Putting many entities per row limits concurrency.
Today,
> > in Cassandra, a full row is deserialized into memory during compaction.
> > This limits the number of entities that can be put in a single row.
Also,
> > intuitively, an entity is more naturally represented as a row with
> > attributes stored as columns.
>
>
> Prashant
>
>    1. Application can send the index  and the entityId update in the same
> write , a write per row is always atomic given that teh index and the
data
> have teh same key in the above
>        case the index will not be out of sync.
>     2. The maintainance of index by app can be moved into cassandra and I
> agree with fact that you can add support for it by a built in special CF
> which you have to do in either of the approaches you are taking.
>         Infact in the first approach that you are taking it will be
easier
> to move the indexes in case of adding nodes to the cluster and when files
> are split and data is sent over. In the second approach this
>         process could get comlicated.
>     3. There is no locking for row access in cassandra.
>     4. Compactions can be opmtimized for name sorted columns , this is
one
> of the workitems we have where we do not deserialize the entire row in
> compactiopn but only do it in slices , this is easily achievable for
>         name sorted columns.
>     5. The entity can still be represented naturally as a supercolumn
where
> each of the super column name is the entity Id and each of the columns in
> the supercolumn are attribute value pairs.
>     6.  How many entities per  groupid are we talking about here ? why do
> you feel concurrency is limited by entities per row ?
>
>
> >
> > To address the above problems, we are thinking of the following new
> > implementation. Each entity is mapped to a row in Cassandra and uses a
> > two-part key (groupID, entityID). We use the groupID to hash an entity
to a
> > node. This way, all entities for a group will be collocated in the same
> > node. We then define a special CF to serve as the secondary index. In
the
> > definition, we specify what entity attributes need to be indexed  and
in
> > what order. Within a node, this special CF will index all rows stored
> > locally. Every time we insert a new entity, the server automatically
> > extracts the index key based on the index definition (for example, the
> > index key can be of the form "hash
(rowkey):attribute1:attribute2:rowkey)
> > and add the index entry to the special CF. We can then access the
entities
> > using an extended version of the query language in Cassandra. For
example,
> > if we issue the following query and there is an index defined by
> > (attributeX, attributeY), the query can be evaluated using the index in
the
> > special CF. (Note that AppEngine supports this flavor of queries.)
> >
> > select attributeZ
> > from ROWS(HASH = hash(groupID))
> > where attributeX="x"
> > order by attributeY desc
> > limit 50
> >
> > We are in the middle of prototyping this approach. We'd like to hear if
> > other people are interested in this too or if people think there are
better
> > alternatives.
> >
>
>
> Prashant
>
>    1. In this approach I see a number of problems during addition ,
removal
> and moving of  new nodes. These problems are not impossible to solve but
> just harder and inefficient with the above approach.
>    2. What are the wins of this approach over the other is not clear to
me.
> could you please highlight those.
>
>
> >
> > Jun
> > IBM Almaden Research Center
> > K55/B1, 650 Harry Road, San Jose, CA  95120-6099
> >
> > [email protected]

Reply via email to