atris commented on pull request #7638:
URL: https://github.com/apache/pinot/pull/7638#issuecomment-954542805


   >     * For offline, in `LuceneTextIndexReader`, we build a mapping file 
(luceneDocId -> pinotDocId) during segment load to avoid expensive retrieval of 
entire Lucene document. That file is built by iterating over numDocs in the 
Pinot segment which is equal to numDocs in lucene index fo SV column. For MV 
column, this is not true since for each Pinot doc, we are adding docs equal to 
length of array. So, during query processing when `LuceneDocIdCollector` looks 
up the mapping file in `DocIdTranslator`, it can seg fault as Lucene will 
return a docId > max pinot docId
   
   Would it, since DocIDTranslator looks at the DocID field in the returned 
document, and even for MV fields, we have a single DocID field, with a single 
value?
   
   >     * For realtime, can you also add this new interface support in 
`RealtimeLuceneTextIndexReader` that acts as both reader and writer and uses 
Lucene NRT search ? Separate PR is also fine
   
   I will follow up with a PR for that
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to