Robert Muir and I have discussed what Robert eventually named "postings lists deduplication" at bbuzz 2013 conference in Berlin.
The idea is to allow multiple terms to point to the same postings list to save space. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. At the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. For example, we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Would you like a jira for this? Thanks, Dmitry Kan
