Robert Muir and I have discussed what Robert eventually named "postings
lists deduplication" at bbuzz 2013 conference in Berlin.

The idea is to allow multiple terms to point to the same postings list to
save space.

The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.

At the moment, when supporting exact (unstemmed) and inexact (stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. For example, we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.

Would you like a jira for this?

Thanks,

Dmitry Kan

Reply via email to