dnhatn commented on PR #15125:
URL: https://github.com/apache/lucene/pull/15125#issuecomment-3225835031

   Thanks Luca!
   
   > To make sure I understand the issue 100%: when there is an inconsistency, 
an IAE is thrown as part of initializeFieldInfo, which though leaves the 
PerField instance without field info set, which when accessed later causes 
problems, especially when consulting the doc values type to sort the segments 
if index sorting is enabled. Correct?
   
   In other places before accessing fieldInfo, we check if the corresponding 
writer (e.g., pointValuesWriter, docValuesWriter) is null. We overlooked this 
null check when building comparators for index sorting.
   
   > Are there other scenarios where the same could happen, that a PerField 
instance has no field info set to it, and is index sorting the only place where 
this causes issues? This code seems pretty delicate and it seems like things 
may go wrong in other ways, but I can't tell exactly.
   
   I've reviewed the IndexingChain class, and this appears to be the only place 
where we missed the check.
   
   > Your fix handles the null field info by checking the doc values writer. 
Could you expand on why checking docValuesWriter instead of checking field info 
directly? Are there circumstances where you'd expect the former to be null but 
the latter non null?
   
   The fieldInfo is set just before the corresponding writer is initialized. 
Therefore, if the writer is non-null, the fieldInfo will also be non-null.
   
   > Do we have a chance to consider a more comprehensive fix like delay adding 
the PerField to the perFieldHash until field info is set to it? I can't tell if 
there may be side effects of that though.
   
   Yes, I did consider that. I also explored the option of removing PerField 
that doesn't have fieldInfos after hitting an exception. However, both could be 
expensive and significantly slow down, especially when documents contain many 
fields. For this reason, I opted for the current approach.
   
   > Last but not least: with your fix, what is the effect on index sorting of 
returning null when field info is null?
   
   Returning null for an index sort field can occur in two scenarios:
   
   1. The value for the sort field never appears in the documents of the 
current DWPT.
   2. A schema exception is encountered, and other documents do not contain the 
sort field.
   
   Since we do not index fields from invalid documents, the second case is 
safely treated the same as the first - meaning there is no value for the sort 
field.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to