PerRowSecondaryIndex Multiple Load

Adam Holmberg Thu, 31 May 2012 14:46:58 -0700

I've been studying and experimenting some with the SecondaryIndex API,
specifically extending the PerRowSecondaryIndex class.


I understand from this recent
thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g...@mail.gmail.com%3E>that
this feature is not yet widely used, but I was hoping someone could
shed some light on its intended concept of operation:

My intuition was to specify my custom index for every column in the column
family that I want to trigger an update for this index. This has the
desired effect for a 'built' index as new row mutations arrive. What I'm
confused about is what happens as the index is built for the first time.
What I'm seeing is that an asynchronous build is kicked off for every
column to which this index is attached (which is obviously undesirable).

It's plain to see why this is happening following the SecondaryIndexManager
reload/addIndexedColumn routines. Now what I'm wondering is if there is
room for improvement here:

Should the Manager wait to initiate an index build until the last column
has been added to a given rowLevelIndex? Or is the impetus on the
PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
the build until the last column is added?

My gut says the former would be preferred since the latter could be a
fragile use of the Interface, but I'm just getting into this area and maybe
I'm thinking about things wrong.

Any input would be appreciated.

Regards,
Adam Holmberg

PerRowSecondaryIndex Multiple Load

Reply via email to