[
https://issues.apache.org/jira/browse/LUCENE-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855964#comment-16855964
]
Adrien Grand commented on LUCENE-8830:
--------------------------------------
The logic looks fine to me, the omitNorms flag is just set later in the
process. That said that logic is a bit complicated so I could have missed
something.
I tried to reproduce the bug you are describing without success. Can you
provide us with a test case?
> DefaultIndexingChain.getOrAddField method ignores omitNorms from FieldType
> --------------------------------------------------------------------------
>
> Key: LUCENE-8830
> URL: https://issues.apache.org/jira/browse/LUCENE-8830
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 6.6.1
> Reporter: Ishan Sri
> Priority: Major
>
> Norms are being computed and written even when *omitNorms is set to true* in
> the fieldTypes. I chased the issue and found that the method *getOrAddField*
> tries to create a *FieldInfo* object in the 1st pass. By default this object
> has omitNorms to false. The method sets the *indexOptions* as specified in
> the fieldType on this newly created object but doesn't do the same for
> *omitNorms.* This effectively overrides this flag which creates issues down
> the line.
>
> Here's the code snippet for the method with the *fieldInfos.getOrAdd* call
>
>
> {code:java}
> private PerField getOrAddField(String name, IndexableFieldType fieldType,
> boolean invert) {
> // Make sure we have a PerField allocated
> final int hashPos = name.hashCode() & hashMask;
> PerField fp = fieldHash[hashPos];
> while (fp != null && !fp.fieldInfo.name.equals(name)) {
> fp = fp.next;
> }
> if (fp == null) {
> // First time we are seeing this field in this segment
> FieldInfo fi = fieldInfos.getOrAdd(name);
> // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks
> at the // initial IndexOptions to decide what arrays it must create). Then,
> we also must // set it in PerField.invert to allow for later downgrading of
> the index options:
> fi.setIndexOptions(fieldType.indexOptions());
> fp = new PerField(fi, invert);
> ... {code}
>
>
>
> The *getOrAdd* method below instantiates a new object with omitNorms set to
> false as the 4th parameter.
>
> {code:java}
> /** Create a new field, or return existing one. */
> public FieldInfo getOrAdd(String name) {
> FieldInfo fi = fieldInfo(name);
>
> if (fi == null) {
> // This field wasn't yet added to this in-RAM
> // segment's FieldInfo, so now we get a global
> // number for this field. If the field was seen
> // before then we'll get the same name and number,
> // else we'll allocate a new one:
> final int fieldNumber = globalFieldNumbers.addOrGet(name, -1,
> DocValuesType.NONE, 0, 0);
>
> fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE,
> DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
> assert !byName.containsKey(fi.name);
> globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name,
> DocValuesType.NONE);
> byName.put(fi.name, fi);
> }
> return fi;
> }{code}
>
> This will cause norms to always be computed which not only produces incorrect
> scores but also impacts the disk usage if there are many documents with
> multiple fields which have this flag set to true but ignored
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]