[ 
https://issues.apache.org/jira/browse/LUCENE-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062343#comment-14062343
 ] 

Michael McCandless commented on LUCENE-5819:
--------------------------------------------

The gist of the change here is that the terms index FST, via a new
custom Outputs impl FSTOrdsOutputs, now also stores the start and end
ord range for each block.  The end ord is also necessary because the
terms don't neatly fall into just the leaf blocks: "straggler" terms
can easily fall inside inner blocks, and in this case we need the end
ord of the lower blocks to realize the term is a "straggler".

The on-disk blocks themselves are nearly the same; the only difference
is when a block writes a pointer to a sub-block, it now also writes
(vlong) how many terms are in that sub-block.  This way when we are
seeking by ord and skip that sub-block we know how many ords were just
skipped.

I made a custom getByOutput to handle the ranges, falling back to the
last range that included the target ord while recursing.

Otherwise the terms dict is basically the same as the normal block
tree, including optimized intersect (w/o ord() implemented: not sure
we need it), except all seek/next operations also compute the term
ord.  Floor blocks also store the term ord each one starts on.


> Add block tree postings format that supports term ords
> ------------------------------------------------------
>
>                 Key: LUCENE-5819
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5819
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/other
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.10
>
>         Attachments: LUCENE-5819.patch, LUCENE-5819.patch
>
>
> BlockTree is our default terms dictionary today, but it doesn't
> support term ords, which is an optional API in the postings format to
> retrieve the ordinal for the currently seek'd term, and also later
> seek by that ordinal e.g. to lookup the term.
> This can possibly be useful for e.g. faceting, and maybe at some point
> we can share the postings terms dict with the one used by sorted/set
> DV for cases when app wants to invert and facet on a given field.
> The older (3.x) block terms dict can easily support ords, and we have
> a Lucene41OrdsPF in test-framework, but it's not as fast / compact as
> block-tree, and doesn't (can't easily) implement an optimized
> intersect, but it could be for fields we'd want to facet on, these
> tradeoffs don't matter.  It's nice to have options...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to