The compaction optimization that Prashant mentioned is likely to solve
many of the problems that Jun brings up.

We were thinking of tackling this problem ... I've opened a ticket in
JIRA (https://issues.apache.org/jira/browse/CASSANDRA-16)

Avinash, Prashant -- If you guys are already working on it, feel free
to assign it to yourself. Otherwise we'll sketch out a plan and send
it out, if the community agrees on the idea, we can start hacking
away.

Sandeep


On Wed, Mar 25, 2009 at 10:50 AM, Jun Rao <jun...@almaden.ibm.com> wrote:
>
> Some comments inlined below.
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>
> jun...@almaden.ibm.com
>
>
> Avinash Lakshman <avinash.laksh...@gmail.com> wrote on 03/24/2009 10:08:45
> PM:
>
>> Comments inline.
>>
>> On Tue, Mar 24, 2009 at 6:53 PM, Jun Rao <jun...@almaden.ibm.com> wrote:
>>
>> >
>> > Prashant,
>> >
>> > Thanks for the comments. They are quite useful. Let me try to address
> some
>> > of the points that you made.
>> >
>> > 1. It is true that in our current implementation, we can glue the
> changes
>> > on both the data and the index in one batch_update() call. This way,
> the
>> > data and the index will be maintained synchronously. However,
> maintaining
>> > the index on the server is likely more efficient since there is less
>> > communication overhead. You seem to agree with this.
>>
>>
>> [Avinash] You can update multiple column families for a single key in one
>> mutation.
>>
>> >
>> >
>> > 2. Cassandra currently doesn't acquire row-lock for row accesses.
> However,
>> > the implication is that a reader may see partial updates of a row. For
>> > example, suppose that a writer updates two columns in different CFs.
> Then,
>> > it is possible for a concurrent reader to see the update on one column,
> but
>> > not the other one. For some applications, row-level consistency could
> be
>> > important. It's probably for this reason, in HBase, a region server
>> > acquires a row lock for every read and write.
>>
>>
>> [Avinash] Updates to a single row within a machine are atomic. Which
> means
>> what you are stating will not happen. Writes and reads will be serialized
> at
>> the Memtable.
>
> This problem doesn't show up in Cassandra today because there is no method
> that can read columns from different CFs in a row. If there were such a
> method, it would be hard to enforce that a reader always sees a complete
> update (updating multiple CFs) without some sort of row locks.
>
>>
>> >
>> >
>> > 3. For our current application, the size of all entities in a group is
> not
>> > too large and likely fits within the capacity of a single node.
> However,
>> > for other applications, being able to scale a group to more than a node
>> > could be useful. Storing a group within a single row will prevent
> scaling
>> > out the group.
>>
>> [Avinash] I guess the question is how many entities do you envision in a
>> group. What do you mean by fitting into one node?
>>
>
> A large group may not fit in memory, but should fit in a commodity disk.
> The compaction optimization Prashant mentioned will definitely make our
> current approach more feasible.
>
> However, in general, I am a bit concerned about putting too much stuff
> within a row. A row is a unit that has finite capacity and a user shouldn't
> expect to put an infinite number of columns within a row. I actually like
> the current assumption in Cassandra that a row has to fit in memory since
> it simplifies the implementation. On the other hand, a table can have
> arbitrary capacity (one just need to provision enough nodes in the cluster)
> and it can have as many rows as you want.
>
>> >
>> >
>> > Jun
>> > IBM Almaden Research Center
>> > K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>> >
>> > jun...@almaden.ibm.com
>> >
>> >
>> > Prashant Malik <pma...@gmail.com> wrote on 03/24/2009 11:34:51 AM:
>> >
>> > > Some questions Iline
>> > >
>> > > On Tue, Mar 24, 2009 at 10:21 AM, Jun Rao <jun...@almaden.ibm.com>
>> > wrote:
>> > >
>> > > >
>> > > >
>> > > > We have an application that has groups and entities. A group has
> many
>> > > > entities and an entity has a bunch of (attribute, value) pairs. A
>> > common
>> > > > access pattern is to select some number of entities within a group
> with
>> > > > attribute X equals to x and ordered by attribute Y. For efficiency,
> we
>> > want
>> > > > to build a secondary index for each group and collocate a group and
> its
>> > > > secondary index on the same node. Our current approach is to map a
>> > group to
>> > > > a row in Cassandra and each entity to a column in a column family
> (CF).
>> > > > Within the same row, we use a separate CF (ordered by name) to
>> > implement a
>> > > > secondary index, say on attribute X and Y. In this family, each
> column
>> > name
>> > > > has the form of X:x:Y:y:entityID. We extended the get_slice()
> function
>> > so
>> > > > that it can get a slice of columns starting from a given column.
> The
>> > > > extended function uses the column-level index to locate the
> starting
>> > column
>> > > > quickly. (We'd be happy to contribute this extension back to
> Cassandra
>> > if
>> > > > people find this useful). Using the extended get_slice(), we were
> able
>> > to
>> > > > access the entities through the simulated secondary index.
>> > > >
>> > > > We see a couple of problems with the current approach. First, our
>> > > > application has to maintain the index. This is inefficient and
> could
>> > leave
>> > > > the index inconsistent when failure occurs.  Second, mapping each
>> > entity to
>> > > > a column may not be a good idea. Often, there is some sort of
> locking
>> > for
>> > > > each row access. Putting many entities per row limits concurrency.
>> > Today,
>> > > > in Cassandra, a full row is deserialized into memory during
> compaction.
>> > > > This limits the number of entities that can be put in a single row.
>> > Also,
>> > > > intuitively, an entity is more naturally represented as a row with
>> > > > attributes stored as columns.
>> > >
>> > >
>> > > Prashant
>> > >
>> > >    1. Application can send the index  and the entityId update in the
> same
>> > > write , a write per row is always atomic given that teh index and the
>> > data
>> > > have teh same key in the above
>> > >        case the index will not be out of sync.
>> > >     2. The maintainance of index by app can be moved into cassandra
> and I
>> > > agree with fact that you can add support for it by a built in special
> CF
>> > > which you have to do in either of the approaches you are taking.
>> > >         Infact in the first approach that you are taking it will be
>> > easier
>> > > to move the indexes in case of adding nodes to the cluster and when
> files
>> > > are split and data is sent over. In the second approach this
>> > >         process could get comlicated.
>> > >     3. There is no locking for row access in cassandra.
>> > >     4. Compactions can be opmtimized for name sorted columns , this
> is
>> > one
>> > > of the workitems we have where we do not deserialize the entire row
> in
>> > > compactiopn but only do it in slices , this is easily achievable for
>> > >         name sorted columns.
>> > >     5. The entity can still be represented naturally as a supercolumn
>> > where
>> > > each of the super column name is the entity Id and each of the
> columns in
>> > > the supercolumn are attribute value pairs.
>> > >     6.  How many entities per  groupid are we talking about here ?
> why do
>> > > you feel concurrency is limited by entities per row ?
>> > >
>> > >
>> > > >
>> > > > To address the above problems, we are thinking of the following new
>> > > > implementation. Each entity is mapped to a row in Cassandra and
> uses a
>> > > > two-part key (groupID, entityID). We use the groupID to hash an
> entity
>> > to a
>> > > > node. This way, all entities for a group will be collocated in the
> same
>> > > > node. We then define a special CF to serve as the secondary index.
> In
>> > the
>> > > > definition, we specify what entity attributes need to be indexed
> and
>> > in
>> > > > what order. Within a node, this special CF will index all rows
> stored
>> > > > locally. Every time we insert a new entity, the server
> automatically
>> > > > extracts the index key based on the index definition (for example,
> the
>> > > > index key can be of the form "hash
>> > (rowkey):attribute1:attribute2:rowkey)
>> > > > and add the index entry to the special CF. We can then access the
>> > entities
>> > > > using an extended version of the query language in Cassandra. For
>> > example,
>> > > > if we issue the following query and there is an index defined by
>> > > > (attributeX, attributeY), the query can be evaluated using the
> index in
>> > the
>> > > > special CF. (Note that AppEngine supports this flavor of queries.)
>> > > >
>> > > > select attributeZ
>> > > > from ROWS(HASH = hash(groupID))
>> > > > where attributeX="x"
>> > > > order by attributeY desc
>> > > > limit 50
>> > > >
>> > > > We are in the middle of prototyping this approach. We'd like to
> hear if
>> > > > other people are interested in this too or if people think there
> are
>> > better
>> > > > alternatives.
>> > > >
>> > >
>> > >
>> > > Prashant
>> > >
>> > >    1. In this approach I see a number of problems during addition ,
>> > removal
>> > > and moving of  new nodes. These problems are not impossible to solve
> but
>> > > just harder and inefficient with the above approach.
>> > >    2. What are the wins of this approach over the other is not clear
> to
>> > me.
>> > > could you please highlight those.
>> > >
>> > >
>> > > >
>> > > > Jun
>> > > > IBM Almaden Research Center
>> > > > K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>> > > >
>> > > > jun...@almaden.ibm.com
>> >

Reply via email to