Nice paper! It's a neat trick to index the large postings as separate files, ie let the fileystem handle the growth as new postings are appended over time.
But, unfortunately, we can't easily do this in Lucene, since Lucene assumes index files are write once, and derives its transactional semantics from this approach. Ie, this would require sizable changes, beyond just swapping in a different Codec. Still, the idea that small/big postings lists should be handled differently is something we can take advantage of in a Codec, and I think we should. I think likely we will switch to a default codec that uses pulsing (storing term's postiugs directly in terms dict) for very low freq terms, maybe vInt for medium freq terms, and FOR/PFOR for high freq terms. Mike On Mon, Oct 4, 2010 at 6:42 PM, Burton-West, Tom <tburt...@umich.edu> wrote: > Hi all, > > Would it be possible to implement something like this in Flex? > > > Büttcher, S., & Clarke, C. L. A. (2008). Hybrid index maintenance for > contiguous inverted lists. Information Retrieval, 11(3), 175-207. > doi:10.1007/s10791-007-9042-8 > > The approach takes advantage of having a different policy for large postings > lists (ie frequent terms) versus small postings lists for flushing the > buffer and writing to disk. > > > Tom Burton-West > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org