Seems like a good place to try out Lucene's FST [1] data structure
which would enable more keys to be loaded into RAM (for more granular
seeks), along with their positions.  Lucene uses this for the terms
dictionary and it's use made for nice gains in the efficiency of the
terms dictionary.  The efficiency gains are potentially much better if
the FST were used in Cassandra?

This is an example of how it is used [2].  Not only is the memory
usage very efficient because it's a single byte[], the keys are
compressed as well as the position value.

If you are interested I can help, I used the FST on a Hadoop project
to implement a fast map side range join.

1. http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html
2. 
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/codecs/VariableGapTermsIndexReader.java

On Wed, Jun 6, 2012 at 12:05 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
> Implementation is in IndexSummary.java; the core is
>
>    private final ArrayList<Long> positions;
>    private final ArrayList<DecoratedKey> keys;
>
> So no, nothing fancy like prefix compression.
>
> On Wed, Jun 6, 2012 at 11:00 AM, Jason Rutherglen
> <jason.rutherg...@gmail.com> wrote:
>> I am wondering how this is currently implemented?  Is there prefix 
>> compression?
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Reply via email to