[
https://issues.apache.org/jira/browse/LUCENE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942754#comment-13942754
]
Shai Erera commented on LUCENE-5542:
------------------------------------
I don't think it makes the API more complicated. To the users of the API we say
"pass only docs with values". To the Codec developers we say "you are going to
get only docs with values, so encode however you see fit such that you can
later provide docsWithFields efficiently". It's not about performance yet, but
about making the API clear (in my opinion) - stating that {{null}} denotes a
missing value for a document is not better than just not passing the document
in the first place.
> Explore making DVConsumer sparse-aware
> --------------------------------------
>
> Key: LUCENE-5542
> URL: https://issues.apache.org/jira/browse/LUCENE-5542
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Shai Erera
>
> Today DVConsumer API requires the caller to pass a value for every document,
> where {{null}} means "this doc has no value". The Codec can then choose how
> to encode the values, i.e. whether it encodes a 0 for a numeric field, or
> encodes the sparse docs. In practice, from what I see, we choose to encode
> the 0s.
> I wonder if we e.g. added an {{Iterable<Number>}} to
> DVConsumer.addXYZField(), if that would make a better API. The caller only
> passes <doc,value> pairs and it's up to the Codec to decide how it wants to
> encode the missing values. Like, if a user's app truly has a sparse NDV,
> IndexWriter doesn't need to "fill the gaps" artificially. It's the job of the
> Codec.
> To be clear, I don't propose to change any Codec implementation in this issue
> (w.r.t. sparse encoding - yes/no), only change the API to reflect that
> sparseness. I think that if we'll ever want to encode sparse values, it will
> be a more convenient API.
> Thoughts? I volunteer to do this work, but want to get others' opinion before
> I start.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]