Hi Courtney, You can take a look at lucandra http://github.com/tjake/Lucandra which uses the lucene api to maintain a inverted index in cassandra. There are a couple articles and presentations in the readme that give more info on how this is done.
-Jake On Fri, Sep 3, 2010 at 6:26 AM, Courtney Robinson <sa...@live.co.uk> wrote: > A few of us working on a book for casanadra and got to the point where we > (well I did anyway) wanted to include an example of a non trivial inverted > index. > > I've been playing around with different ideas on how I could store the > data and I've had a look at the previous threads that touched on the subject > but with the 2 or 3 ideas I've seen on the list someone always points out > something in the approach that punches a hole in it. > > I've been playing around with the idea of using a Columnfamily for the > index where I store the terms as the key then each column name is a 64 bit > long and its value is the doc id. If the column name represents a ranking > for the doc id it stores and the compare with option is LongType then once a > term is retrieved the first x amount of columns would represent the most > related docs for that term. > > I'd go on in more detail but I'm using my phone to write this and I think > that gets the idea across. > Ofcourse my first thought to this is, is it scalable? In a system where > possibly millions of docs are related to one term, is that a good idea to > have potentially that many columns in one row all associated to the one row > key which is the term? > > I just want to know what others think, if you have any suggestions or have > a similar thing implemented and you're able to share. > > On a side note to that, there has been a bit of talk about secondary > indexes in 0.7 can anyone shed some light on that, or point me to any > presentation or the like where its mentioned so I can get a better idea of > what its for. > > Thanks, > Courtney >