Ryan Ernst created LUCENE-6030:
----------------------------------
Summary: Add norms patched compression which uses table for most
common values
Key: LUCENE-6030
URL: https://issues.apache.org/jira/browse/LUCENE-6030
Project: Lucene - Core
Issue Type: Improvement
Reporter: Ryan Ernst
We have added the PATCHED norms sub format in lucene 50, which uses a bitset to
mark documents that have the most common value (when >97% of the documents have
that value). This works well for fields that have a predominant value length,
and then a small number of docs with some other random values. But another
common case is having a handful of very common value lengths, like with a title
field.
We can use a table (see TABLE_COMPRESSION) to store the most common values, and
save an oridinal for the "other" case, at which point we can lookup in the
secondary patch table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]