[jira] Resolved: (LUCENE-2654) bulk-code each chunk b/w indexed terms in the terms dict

Robert Muir (JIRA) Mon, 17 Jan 2011 16:26:07 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir resolved LUCENE-2654.
---------------------------------

    Resolution: Duplicate

duplicate of LUCENE-2872

> bulk-code each chunk b/w indexed terms in the terms dict
> --------------------------------------------------------
>
>                 Key: LUCENE-2654
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2654
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Michael McCandless
>            Priority: Minor
>
> This is an idea for exploration that came up w/ Robert...
> In PrefixCodedTermsDict (used by the default Standard codec), we encode each 
> term entry "standalone", using vInts.  We store the changed suffix (start, 
> end, bytes), then metadata for the term like docFreq, frq start, prx start, 
> skip start.  Each of these ints is a vInt, which is relatively costly.
> If instead we store the N terms between indexed terms "column-stride", using 
> bulk codec like FOR/PFOR, so that the 32 docFreqs are stored as one block, 32 
> frq deltas as another, etc., then seek and next should be faster.  Ie, we 
> could make decode of the metadata lazy, so that a seek to a term that does 
> not exist may be able avoid any metadata decode entirely.  Sequential 
> scanning (lots of .next in a row) would also be faster, even if it needs the 
> metadata since bulk-decode should be faster than multiple vInt decodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Resolved: (LUCENE-2654) bulk-code each chunk b/w indexed terms in the terms dict

Reply via email to