dnhatn commented on PR #15125: URL: https://github.com/apache/lucene/pull/15125#issuecomment-3225835031
Thanks Luca! > To make sure I understand the issue 100%: when there is an inconsistency, an IAE is thrown as part of initializeFieldInfo, which though leaves the PerField instance without field info set, which when accessed later causes problems, especially when consulting the doc values type to sort the segments if index sorting is enabled. Correct? In other places before accessing fieldInfo, we check if the corresponding writer (e.g., pointValuesWriter, docValuesWriter) is null. We overlooked this null check when building comparators for index sorting. > Are there other scenarios where the same could happen, that a PerField instance has no field info set to it, and is index sorting the only place where this causes issues? This code seems pretty delicate and it seems like things may go wrong in other ways, but I can't tell exactly. I've reviewed the IndexingChain class, and this appears to be the only place where we missed the check. > Your fix handles the null field info by checking the doc values writer. Could you expand on why checking docValuesWriter instead of checking field info directly? Are there circumstances where you'd expect the former to be null but the latter non null? The fieldInfo is set just before the corresponding writer is initialized. Therefore, if the writer is non-null, the fieldInfo will also be non-null. > Do we have a chance to consider a more comprehensive fix like delay adding the PerField to the perFieldHash until field info is set to it? I can't tell if there may be side effects of that though. Yes, I did consider that. I also explored the option of removing PerField that doesn't have fieldInfos after hitting an exception. However, both could be expensive and significantly slow down, especially when documents contain many fields. For this reason, I opted for the current approach. > Last but not least: with your fix, what is the effect on index sorting of returning null when field info is null? Returning null for an index sort field can occur in two scenarios: 1. The value for the sort field never appears in the documents of the current DWPT. 2. A schema exception is encountered, and other documents do not contain the sort field. Since we do not index fields from invalid documents, the second case is safely treated the same as the first - meaning there is no value for the sort field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org