[ 
https://issues.apache.org/jira/browse/LUCENE-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7475:
---------------------------------
    Attachment: LUCENE-7475.patch

Here is a patch that:
 - fixes NormValuesWriter to support sparse norms
 - adds a new Lucene70NormsFormat that supports sparsity and only encodes norms 
for documents that have a norm
 - adds a {{codecSupportsSparsity}} method to BaseNormsFormatTestCase so that 
modern norms formats can get proper testing of the sparse case
 - fixes SimpleTextNormsFormat to support sparsity
 - moves Lucene53NormsFormat to the backward-codecs module

Notes:
 - the current patch assigns a norm value of zero to fields that generate no 
tokens (can happen eg. with the empty string or if all tokens are stop words) 
and only considers that a document does not have norms if no text field were 
indexed at all. We could also decide that fields that generate no tokens are 
considered as missing too, I think both approaches can make sense.
 - the new Lucene70NormsFormat is only a first step, it can certainly be 
improved in further issues

> Sparse norms
> ------------
>
>                 Key: LUCENE-7475
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7475
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: master (7.0)
>
>         Attachments: LUCENE-7475.patch
>
>
> Even though norms now have an iterator API, they are still always dense in 
> practice since documents that do not have a value get assigned 0 as a norm 
> value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to