[ https://issues.apache.org/jira/browse/LUCENE-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-7475: --------------------------------- Attachment: LUCENE-7475.patch Here is a patch that: - fixes NormValuesWriter to support sparse norms - adds a new Lucene70NormsFormat that supports sparsity and only encodes norms for documents that have a norm - adds a {{codecSupportsSparsity}} method to BaseNormsFormatTestCase so that modern norms formats can get proper testing of the sparse case - fixes SimpleTextNormsFormat to support sparsity - moves Lucene53NormsFormat to the backward-codecs module Notes: - the current patch assigns a norm value of zero to fields that generate no tokens (can happen eg. with the empty string or if all tokens are stop words) and only considers that a document does not have norms if no text field were indexed at all. We could also decide that fields that generate no tokens are considered as missing too, I think both approaches can make sense. - the new Lucene70NormsFormat is only a first step, it can certainly be improved in further issues > Sparse norms > ------------ > > Key: LUCENE-7475 > URL: https://issues.apache.org/jira/browse/LUCENE-7475 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7475.patch > > > Even though norms now have an iterator API, they are still always dense in > practice since documents that do not have a value get assigned 0 as a norm > value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org