[jira] [Resolved] (LUCENE-5743) new 4.9 norms format

Robert Muir (JIRA) Tue, 10 Jun 2014 05:50:20 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir resolved LUCENE-5743.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 5.0
                   4.9

I added the Arrays.sort(), also a step towards a BaseNormsFormatTestCase. I've 
always been concerned that we didnt have enough stuff testing the norms 
directly...  

> new 4.9 norms format
> --------------------
>
>                 Key: LUCENE-5743
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5743
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5743.patch
>
>
> Norms can eat up a lot of RAM, since by default its 8 bits per field per 
> document. We rely upon users to omit them to not blow up RAM, but its a 
> constant trap.
> Previously in 4.2, I tried to compress these by default, but it was too slow. 
> My mistakes were:
> * allowing slow bits per value like bpv=5 that are implemented with expensive 
> operations.
> * trying to wedge norms into the generalized docvalues numeric case
> * not handling "simple" degraded cases like "constant norm" the same norm 
> value for every document.
> Instead, we can just have a separate norms format that is very careful about 
> what it does, since we understand in general the patterns in the data:
> * uses CONSTANT compression (just writes the single value to metadata) when 
> all values are the same.
> * only compresses to bitsPerValue = 1,2,4 (this also happens often, for very 
> short text fields like person names and other stuff in structured data)
> * otherwise, if you would need 5,6,7,8 bits per value, we just continue to do 
> what we do today, encode as byte[]. Maybe we can improve this later, but this 
> ensures we don't have a performance impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (LUCENE-5743) new 4.9 norms format

Reply via email to