Terms dict should block-encode terms
------------------------------------
Key: LUCENE-2872
URL: https://issues.apache.org/jira/browse/LUCENE-2872
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 4.0
Attachments: LUCENE-2872.patch
With PrefixCodedTermsReader/Writer we now encode each term standalone,
ie its bytes, metadata, details for postings (frq/prox file pointers),
etc.
But, this is costly when something wants to visit many terms but pull
metadata for only few (eg respelling, certain MTQs). This is
particularly costly for sep codec because it has more metadata to
store, per term.
So instead I think we should block-encode all terms between indexed
term, so that the metadata is stored "column stride" instead. This
makes it faster to enum just terms.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]