Ryan Ernst created LUCENE-6030:
----------------------------------

             Summary: Add norms patched compression which uses table for most 
common values
                 Key: LUCENE-6030
                 URL: https://issues.apache.org/jira/browse/LUCENE-6030
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Ryan Ernst


We have added the PATCHED norms sub format in lucene 50, which uses a bitset to 
mark documents that have the most common value (when >97% of the documents have 
that value).  This works well for fields that have a predominant value length, 
and then a small number of docs with some other random values.  But another 
common case is having a handful of very common value lengths, like with a title 
field.

We can use a table (see TABLE_COMPRESSION) to store the most common values, and 
save an oridinal for the "other" case, at which point we can lookup in the 
secondary patch table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to