Yep, makes sense.  It could be a little slower, but it would decrease
the number of terms indexed by a factor of 256 (for 8 bits).

But the payload part... seems like another case of using that because
CSF isn't there yet, right?
(well, perhaps except if you didn't want to store the field...)

-Yonik
http://www.lucidimagination.com

On Wed, Jun 10, 2009 at 2:28 PM, Uwe Schindler <u...@thetaphi.de> wrote:
> Hi, sorry I missed the first mail.
>
>
>
> The idea we discussed in Amsterdam during ApacheCon was:
>
>
>
> Instead of indexing all trie precisions from e.g. the leftmost 8 bits downto
> all 64 bits, the TrieTokenStream only creates terms from e.g. precisions 8
> to 56. The last precision is left out. Instead the last term (precision 56)
> contains the highest precision as payload.
>
> On the query side, TrieRangeQuery would create the filter bitmap as before
> until it reaches the lowest available precision with the payloads. Instead
> of further splitting this precision into terms, all TermPositions instead of
> just TermDocs are listed, but only those set in the result BitSet, that have
> the payload inside the range bounds. By this the trie query first selects
> large ranges in the middle like before, but uses the highest (but not full
> precision term) to select more docids than needed but filters them with the
> payload.
>
>
>
> With String Dates (the simplified example Michael Busch shows in his talk):
>
> Searching all docs from 2005-11-10 to 2008-03-11 with current trierange
> variant would select terms 2005-11-10 to 2005-11-30, then the whole
> December, the whole years 2006 and 2007 and so on. With payloads, trierange
> would select only whole months (November, December, 2006, 2007, Jan, Feb,
> Mar). At the ends the payloads are used to filter out the days in Nov 2005
> and Mar 2008.
>
>
>
> With the latest TrieRange impl this would be possible to implement (because
> the TrieTokenStreams now used for indexing could create the payloads). Only
> the searching side would no longer so “simple” implemented as yet. My
> biggest problem is how to configure this optimal and make the API clean.
>
>
>
> Was it understandable? (Its complicated, I know)
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> ________________________________
>
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Wednesday, June 10, 2009 7:59 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Payloads and TrieRangeQuery
>
>
>
> I think instead of ORing postings (trie range, rangequery, etc), have a
> custom Query + Scorer that examines the payload (somehow)?  It could encode
> the multiple levels of trie bits in it?  (I'm just guessing here).
>
> On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>
> Use them how?  (Sounds interesting...).
>
> Mike
>
> On Tue, Jun 9, 2009 at 10:32 PM, Jason
> Rutherglen<jason.rutherg...@gmail.com> wrote:
>> At the SF Lucene User's group, Michael Busch mentioned using
>> payloads with TrieRangeQueries. Is this something that's being
>> worked on? I'm interested in what sort performance benefits
>> there would be to this method?
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to