[jira] [Commented] (LUCENE-6006) Replace FieldInfo.normsType with FieldInfo.hasNorms boolean

Robert Muir (JIRA) Tue, 14 Oct 2014 04:09:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170780#comment-14170780
 ]


Robert Muir commented on LUCENE-6006:
-------------------------------------

{quote}
Well, we can't keep compromising Lucene's internal design for the "least common 
denominator" of codecs out there. If you are one of the apps hitting this 
exotic use case, you'll need to use a codec that can sparse-encode your norms..
{quote}

I don't look at it as a design compromise, instead pushing the burden of all 
the "crazy things users do" into the codec: it makes one more difficult to 
write, because if it doesn't incorporate this optimization, users with 100,000 
fields will complain.

Even so, the patch might be the right way to go. I just am sad it does not 
actually go and clean this stuff up, instead waiting for it to "age away" in 
6.x. I would instead nuke this boolean in 5.x. Just means 4.x fieldinfosreader 
needs to set a codec attribute for the "old boolean". 4.x norms readers need to 
look for such an attribute and if they see it, return DocValues.emptyNumeric() 
for the field.

To be able to safely do things like this, we must fix LUCENE-5990 first, and 
ensure that "fieldinfos in" == "fieldinfos out" for everything in the codec 
API. Otherwise they cannot rely upon things like attributes and we can't do 
such cleanups.



> Replace FieldInfo.normsType with FieldInfo.hasNorms boolean
> -----------------------------------------------------------
>
>                 Key: LUCENE-6006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6006
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk
>
>         Attachments: LUCENE-6006.patch
>
>
> I came across this precursor while working on LUCENE-6005:
> I think FieldInfo.normsType can only be null (field did not index
> norms) or DocValuesType.NUMERIC (it did).  I'd like to simplify to
> just boolean hasNorms.
> This is a strange boolean, though: in theory it should be derived from
> {{indexed && omitNorms == false}}, but we have it for the exceptions
> case where every document in a segment hit an exception and never
> added norms.  I think this is the only reason it exists?  (In theory,
> such cases should result in 100% deleted segments, which IW should
> then drop ... but seems dangerous to "rely" on that).
> So I changed the indexing chain to just fill in the default (0) norms
> for all documents in such exceptional cases; this way going forward
> (starting with 5.0 indices) we really don't need this hasNorms.  But
> we still need it for pre-5.0 indices...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6006) Replace FieldInfo.normsType with FieldInfo.hasNorms boolean

Reply via email to