Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case.
And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem -> all of its surface forms). Mike McCandless http://blog.mikemccandless.com On Thu, Jun 6, 2013 at 3:51 AM, Dmitry Kan <[email protected]> wrote: > Robert Muir and I have discussed what Robert eventually named "postings > lists deduplication" at bbuzz 2013 conference in Berlin. > > The idea is to allow multiple terms to point to the same postings list to > save space. > > The application / impact of this is positive for synonyms, exact / inexact > terms, leading wildcard support via storing reversed term etc. > > At the moment, when supporting exact (unstemmed) and inexact (stemmed) > searches, we store both unstemmed and stemmed variant of a word form and > that leads to index bloating. For example, we had to remove the leading > wildcard support via reversing a token on index and query time because of > the same index size considerations. > > Would you like a jira for this? > > Thanks, > > Dmitry Kan --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
