[ https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless resolved LUCENE-5542. ---------------------------------------- Resolution: Duplicate Dup of LUCENE-7407. We now pass a {{DocValuesProducer}} to all the {{addXYZField}} when writing doc values. > Explore making DVConsumer sparse-aware > -------------------------------------- > > Key: LUCENE-5542 > URL: https://issues.apache.org/jira/browse/LUCENE-5542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Shai Erera > > Today DVConsumer API requires the caller to pass a value for every document, > where {{null}} means "this doc has no value". The Codec can then choose how > to encode the values, i.e. whether it encodes a 0 for a numeric field, or > encodes the sparse docs. In practice, from what I see, we choose to encode > the 0s. > I wonder if we e.g. added an {{Iterable<Number>}} to > DVConsumer.addXYZField(), if that would make a better API. The caller only > passes <doc,value> pairs and it's up to the Codec to decide how it wants to > encode the missing values. Like, if a user's app truly has a sparse NDV, > IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the > Codec. > To be clear, I don't propose to change any Codec implementation in this issue > (w.r.t. sparse encoding - yes/no), only change the API to reflect that > sparseness. I think that if we'll ever want to encode sparse values, it will > be a more convenient API. > Thoughts? I volunteer to do this work, but want to get others' opinion before > I start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org