Ooh that sounds compelling!

So you would not need to use payloads for the "inside" brackets,
right?  Only for the edges?

I wonder how performance would compare.  Without payloads, there are
many more terms (for the tiny ranges) in the index, and your OR query
will have lots of these tiny terms.  But then these tiny terms don't
hit many docs, and with BooleanScorer (which we should switch to for
OR queries) ought not be very costly.  Vs w/ payloads having to use
TermPositions, having to load, decode & check the payload, and I guess
assuming on average that 1/2 the docs are filtered out.

Mike

On Wed, Jun 10, 2009 at 2:28 PM, Uwe Schindler<u...@thetaphi.de> wrote:
> Hi, sorry I missed the first mail.
>
>
>
> The idea we discussed in Amsterdam during ApacheCon was:
>
>
>
> Instead of indexing all trie precisions from e.g. the leftmost 8 bits downto
> all 64 bits, the TrieTokenStream only creates terms from e.g. precisions 8
> to 56. The last precision is left out. Instead the last term (precision 56)
> contains the highest precision as payload.
>
> On the query side, TrieRangeQuery would create the filter bitmap as before
> until it reaches the lowest available precision with the payloads. Instead
> of further splitting this precision into terms, all TermPositions instead of
> just TermDocs are listed, but only those set in the result BitSet, that have
> the payload inside the range bounds. By this the trie query first selects
> large ranges in the middle like before, but uses the highest (but not full
> precision term) to select more docids than needed but filters them with the
> payload.
>
>
>
> With String Dates (the simplified example Michael Busch shows in his talk):
>
> Searching all docs from 2005-11-10 to 2008-03-11 with current trierange
> variant would select terms 2005-11-10 to 2005-11-30, then the whole
> December, the whole years 2006 and 2007 and so on. With payloads, trierange
> would select only whole months (November, December, 2006, 2007, Jan, Feb,
> Mar). At the ends the payloads are used to filter out the days in Nov 2005
> and Mar 2008.
>
>
>
> With the latest TrieRange impl this would be possible to implement (because
> the TrieTokenStreams now used for indexing could create the payloads). Only
> the searching side would no longer so “simple” implemented as yet. My
> biggest problem is how to configure this optimal and make the API clean.
>
>
>
> Was it understandable? (Its complicated, I know)
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> ________________________________
>
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Wednesday, June 10, 2009 7:59 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Payloads and TrieRangeQuery
>
>
>
> I think instead of ORing postings (trie range, rangequery, etc), have a
> custom Query + Scorer that examines the payload (somehow)?  It could encode
> the multiple levels of trie bits in it?  (I'm just guessing here).
>
> On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>
> Use them how?  (Sounds interesting...).
>
> Mike
>
> On Tue, Jun 9, 2009 at 10:32 PM, Jason
> Rutherglen<jason.rutherg...@gmail.com> wrote:
>> At the SF Lucene User's group, Michael Busch mentioned using
>> payloads with TrieRangeQueries. Is this something that's being
>> worked on? I'm interested in what sort performance benefits
>> there would be to this method?
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to