[
https://issues.apache.org/jira/browse/LUCENE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170780#comment-14170780
]
Robert Muir commented on LUCENE-6006:
-------------------------------------
{quote}
Well, we can't keep compromising Lucene's internal design for the "least common
denominator" of codecs out there. If you are one of the apps hitting this
exotic use case, you'll need to use a codec that can sparse-encode your norms..
{quote}
I don't look at it as a design compromise, instead pushing the burden of all
the "crazy things users do" into the codec: it makes one more difficult to
write, because if it doesn't incorporate this optimization, users with 100,000
fields will complain.
Even so, the patch might be the right way to go. I just am sad it does not
actually go and clean this stuff up, instead waiting for it to "age away" in
6.x. I would instead nuke this boolean in 5.x. Just means 4.x fieldinfosreader
needs to set a codec attribute for the "old boolean". 4.x norms readers need to
look for such an attribute and if they see it, return DocValues.emptyNumeric()
for the field.
To be able to safely do things like this, we must fix LUCENE-5990 first, and
ensure that "fieldinfos in" == "fieldinfos out" for everything in the codec
API. Otherwise they cannot rely upon things like attributes and we can't do
such cleanups.
> Replace FieldInfo.normsType with FieldInfo.hasNorms boolean
> -----------------------------------------------------------
>
> Key: LUCENE-6006
> URL: https://issues.apache.org/jira/browse/LUCENE-6006
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, Trunk
>
> Attachments: LUCENE-6006.patch
>
>
> I came across this precursor while working on LUCENE-6005:
> I think FieldInfo.normsType can only be null (field did not index
> norms) or DocValuesType.NUMERIC (it did). I'd like to simplify to
> just boolean hasNorms.
> This is a strange boolean, though: in theory it should be derived from
> {{indexed && omitNorms == false}}, but we have it for the exceptions
> case where every document in a segment hit an exception and never
> added norms. I think this is the only reason it exists? (In theory,
> such cases should result in 100% deleted segments, which IW should
> then drop ... but seems dangerous to "rely" on that).
> So I changed the indexing chain to just fill in the default (0) norms
> for all documents in such exceptional cases; this way going forward
> (starting with 5.0 indices) we really don't need this hasNorms. But
> we still need it for pre-5.0 indices...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]