Hi, sorry I missed the first mail.

 

The idea we discussed in Amsterdam during ApacheCon was:

 

Instead of indexing all trie precisions from e.g. the leftmost 8 bits downto
all 64 bits, the TrieTokenStream only creates terms from e.g. precisions 8
to 56. The last precision is left out. Instead the last term (precision 56)
contains the highest precision as payload.

On the query side, TrieRangeQuery would create the filter bitmap as before
until it reaches the lowest available precision with the payloads. Instead
of further splitting this precision into terms, all TermPositions instead of
just TermDocs are listed, but only those set in the result BitSet, that have
the payload inside the range bounds. By this the trie query first selects
large ranges in the middle like before, but uses the highest (but not full
precision term) to select more docids than needed but filters them with the
payload.

 

With String Dates (the simplified example Michael Busch shows in his talk):

Searching all docs from 2005-11-10 to 2008-03-11 with current trierange
variant would select terms 2005-11-10 to 2005-11-30, then the whole
December, the whole years 2006 and 2007 and so on. With payloads, trierange
would select only whole months (November, December, 2006, 2007, Jan, Feb,
Mar). At the ends the payloads are used to filter out the days in Nov 2005
and Mar 2008.

 

With the latest TrieRange impl this would be possible to implement (because
the TrieTokenStreams now used for indexing could create the payloads). Only
the searching side would no longer so "simple" implemented as yet. My
biggest problem is how to configure this optimal and make the API clean.

 

Was it understandable? (Its complicated, I know)

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
 <http://www.thetaphi.de> http://www.thetaphi.de
eMail: u...@thetaphi.de

  _____  

From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Wednesday, June 10, 2009 7:59 PM
To: java-dev@lucene.apache.org
Subject: Re: Payloads and TrieRangeQuery

 

I think instead of ORing postings (trie range, rangequery, etc), have a
custom Query + Scorer that examines the payload (somehow)?  It could encode
the multiple levels of trie bits in it?  (I'm just guessing here).

On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless
<luc...@mikemccandless.com> wrote:

Use them how?  (Sounds interesting...).

Mike


On Tue, Jun 9, 2009 at 10:32 PM, Jason
Rutherglen<jason.rutherg...@gmail.com> wrote:
> At the SF Lucene User's group, Michael Busch mentioned using
> payloads with TrieRangeQueries. Is this something that's being
> worked on? I'm interested in what sort performance benefits
> there would be to this method?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

 

Reply via email to