[ https://issues.apache.org/jira/browse/LUCENE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-5743. --------------------------------- Resolution: Fixed Fix Version/s: 5.0 4.9 I added the Arrays.sort(), also a step towards a BaseNormsFormatTestCase. I've always been concerned that we didnt have enough stuff testing the norms directly... > new 4.9 norms format > -------------------- > > Key: LUCENE-5743 > URL: https://issues.apache.org/jira/browse/LUCENE-5743 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > Fix For: 4.9, 5.0 > > Attachments: LUCENE-5743.patch > > > Norms can eat up a lot of RAM, since by default its 8 bits per field per > document. We rely upon users to omit them to not blow up RAM, but its a > constant trap. > Previously in 4.2, I tried to compress these by default, but it was too slow. > My mistakes were: > * allowing slow bits per value like bpv=5 that are implemented with expensive > operations. > * trying to wedge norms into the generalized docvalues numeric case > * not handling "simple" degraded cases like "constant norm" the same norm > value for every document. > Instead, we can just have a separate norms format that is very careful about > what it does, since we understand in general the patterns in the data: > * uses CONSTANT compression (just writes the single value to metadata) when > all values are the same. > * only compresses to bitsPerValue = 1,2,4 (this also happens often, for very > short text fields like person names and other stuff in structured data) > * otherwise, if you would need 5,6,7,8 bits per value, we just continue to do > what we do today, encode as byte[]. Maybe we can improve this later, but this > ensures we don't have a performance impact. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org