Hi, sorry I missed the first mail.
The idea we discussed in Amsterdam during ApacheCon was: Instead of indexing all trie precisions from e.g. the leftmost 8 bits downto all 64 bits, the TrieTokenStream only creates terms from e.g. precisions 8 to 56. The last precision is left out. Instead the last term (precision 56) contains the highest precision as payload. On the query side, TrieRangeQuery would create the filter bitmap as before until it reaches the lowest available precision with the payloads. Instead of further splitting this precision into terms, all TermPositions instead of just TermDocs are listed, but only those set in the result BitSet, that have the payload inside the range bounds. By this the trie query first selects large ranges in the middle like before, but uses the highest (but not full precision term) to select more docids than needed but filters them with the payload. With String Dates (the simplified example Michael Busch shows in his talk): Searching all docs from 2005-11-10 to 2008-03-11 with current trierange variant would select terms 2005-11-10 to 2005-11-30, then the whole December, the whole years 2006 and 2007 and so on. With payloads, trierange would select only whole months (November, December, 2006, 2007, Jan, Feb, Mar). At the ends the payloads are used to filter out the days in Nov 2005 and Mar 2008. With the latest TrieRange impl this would be possible to implement (because the TrieTokenStreams now used for indexing could create the payloads). Only the searching side would no longer so "simple" implemented as yet. My biggest problem is how to configure this optimal and make the API clean. Was it understandable? (Its complicated, I know) ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de> http://www.thetaphi.de eMail: u...@thetaphi.de _____ From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Wednesday, June 10, 2009 7:59 PM To: java-dev@lucene.apache.org Subject: Re: Payloads and TrieRangeQuery I think instead of ORing postings (trie range, rangequery, etc), have a custom Query + Scorer that examines the payload (somehow)? It could encode the multiple levels of trie bits in it? (I'm just guessing here). On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless <luc...@mikemccandless.com> wrote: Use them how? (Sounds interesting...). Mike On Tue, Jun 9, 2009 at 10:32 PM, Jason Rutherglen<jason.rutherg...@gmail.com> wrote: > At the SF Lucene User's group, Michael Busch mentioned using > payloads with TrieRangeQueries. Is this something that's being > worked on? I'm interested in what sort performance benefits > there would be to this method? > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org