[jira] [Commented] (LUCENE-8830) DefaultIndexingChain.getOrAddField method ignores omitNorms from FieldType

Adrien Grand (JIRA) Tue, 04 Jun 2019 11:09:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855964#comment-16855964
 ]


Adrien Grand commented on LUCENE-8830:
--------------------------------------

The logic looks fine to me, the omitNorms flag is just set later in the 
process. That said that logic is a bit complicated so I could have missed 
something.

I tried to reproduce the bug you are describing without success. Can you 
provide us with a test case?

> DefaultIndexingChain.getOrAddField method ignores omitNorms from FieldType
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-8830
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8830
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 6.6.1
>            Reporter: Ishan Sri
>            Priority: Major
>
> Norms are being computed and written even when *omitNorms is set to true* in 
> the fieldTypes. I chased the issue and found that the method *getOrAddField* 
> tries to create a *FieldInfo* object in the 1st pass. By default this object 
> has omitNorms to false. The method sets the *indexOptions* as specified in 
> the fieldType on this newly created object but doesn't do the same for 
> *omitNorms.* This effectively overrides this flag which creates issues down 
> the line. 
>  
> Here's the code snippet for the method with the *fieldInfos.getOrAdd* call 
>  
>  
> {code:java}
> private PerField getOrAddField(String name, IndexableFieldType fieldType, 
> boolean invert) {
>  // Make sure we have a PerField allocated
>  final int hashPos = name.hashCode() & hashMask;
>  PerField fp = fieldHash[hashPos];
>  while (fp != null && !fp.fieldInfo.name.equals(name)) {
>  fp = fp.next;
>  }
>  if (fp == null) {
>  // First time we are seeing this field in this segment
>  FieldInfo fi = fieldInfos.getOrAdd(name);
> // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks 
> at the // initial IndexOptions to decide what arrays it must create). Then, 
> we also must // set it in PerField.invert to allow for later downgrading of 
> the index options:
>  fi.setIndexOptions(fieldType.indexOptions());
>  fp = new PerField(fi, invert);
>  ...   {code}
>  
>  
>  
> The *getOrAdd* method below instantiates a new object with omitNorms set to 
> false as the 4th parameter.
>  
> {code:java}
> /** Create a new field, or return existing one. */
> public FieldInfo getOrAdd(String name) {
>  FieldInfo fi = fieldInfo(name);
>  
> if (fi == null) {
>  // This field wasn't yet added to this in-RAM
>  // segment's FieldInfo, so now we get a global
>  // number for this field. If the field was seen
>  // before then we'll get the same name and number,
>  // else we'll allocate a new one:
>  final int fieldNumber = globalFieldNumbers.addOrGet(name, -1, 
> DocValuesType.NONE, 0, 0);
>  
> fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE, 
> DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
>  assert !byName.containsKey(fi.name);
>  globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name, 
> DocValuesType.NONE);
>  byName.put(fi.name, fi);
>  }
>  return fi;
> }{code}
>  
> This will cause norms to always be computed which not only produces incorrect 
> scores but also impacts the disk usage if there are many documents with 
> multiple fields which have this flag set to true but ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8830) DefaultIndexingChain.getOrAddField method ignores omitNorms from FieldType

Reply via email to