Yep, makes sense. It could be a little slower, but it would decrease the number of terms indexed by a factor of 256 (for 8 bits).
But the payload part... seems like another case of using that because CSF isn't there yet, right? (well, perhaps except if you didn't want to store the field...) -Yonik http://www.lucidimagination.com On Wed, Jun 10, 2009 at 2:28 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, sorry I missed the first mail. > > > > The idea we discussed in Amsterdam during ApacheCon was: > > > > Instead of indexing all trie precisions from e.g. the leftmost 8 bits downto > all 64 bits, the TrieTokenStream only creates terms from e.g. precisions 8 > to 56. The last precision is left out. Instead the last term (precision 56) > contains the highest precision as payload. > > On the query side, TrieRangeQuery would create the filter bitmap as before > until it reaches the lowest available precision with the payloads. Instead > of further splitting this precision into terms, all TermPositions instead of > just TermDocs are listed, but only those set in the result BitSet, that have > the payload inside the range bounds. By this the trie query first selects > large ranges in the middle like before, but uses the highest (but not full > precision term) to select more docids than needed but filters them with the > payload. > > > > With String Dates (the simplified example Michael Busch shows in his talk): > > Searching all docs from 2005-11-10 to 2008-03-11 with current trierange > variant would select terms 2005-11-10 to 2005-11-30, then the whole > December, the whole years 2006 and 2007 and so on. With payloads, trierange > would select only whole months (November, December, 2006, 2007, Jan, Feb, > Mar). At the ends the payloads are used to filter out the days in Nov 2005 > and Mar 2008. > > > > With the latest TrieRange impl this would be possible to implement (because > the TrieTokenStreams now used for indexing could create the payloads). Only > the searching side would no longer so “simple” implemented as yet. My > biggest problem is how to configure this optimal and make the API clean. > > > > Was it understandable? (Its complicated, I know) > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > ________________________________ > > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: Wednesday, June 10, 2009 7:59 PM > To: java-dev@lucene.apache.org > Subject: Re: Payloads and TrieRangeQuery > > > > I think instead of ORing postings (trie range, rangequery, etc), have a > custom Query + Scorer that examines the payload (somehow)? It could encode > the multiple levels of trie bits in it? (I'm just guessing here). > > On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: > > Use them how? (Sounds interesting...). > > Mike > > On Tue, Jun 9, 2009 at 10:32 PM, Jason > Rutherglen<jason.rutherg...@gmail.com> wrote: >> At the SF Lucene User's group, Michael Busch mentioned using >> payloads with TrieRangeQueries. Is this something that's being >> worked on? I'm interested in what sort performance benefits >> there would be to this method? >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org