Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400372543
 
 

 ##########
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##########
 @@ -169,5 +178,49 @@ public void close()
       throws IOException {
     _indexReader.close();
     _indexDirectory.close();
+    _docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+    private PinotDataBuffer _buffer;
+
+    DocIdReaderWriter(File segmentIndexDir, String column, int numDocs) throws 
Exception {
+      int length = Integer.BYTES * numDocs;
+      File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+          column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+      // For newly added segments, this file will not exist.
+      // For segment refresh, segment reload and server restart, file will 
exist,
+      // but we don't know if we are here for refresh v/s reload v/s restart.
+      // In case of refresh, we have to build the mapping again, but in case of
+      // reload and restart, we don't. Also, reload has a sub-case where this 
text index
+      // was indeed created during reload (user enabled on existing or newly 
added column).
+      // Since there is no way to distinguish why we are here, we build the 
mapping again
+      // regardless.
+      // TODO: see if we can prefetch the pages
+      _buffer =
+          PinotDataBuffer.mapFile(docIdMappingFile, false, 0, length, 
ByteOrder.BIG_ENDIAN, getClass().getSimpleName());
+    }
+
+    public void buildDocIdMapping(int numDocs) {
+      for (int i = 0; i < numDocs; i++) {
+        try {
+          Document document = _indexSearcher.doc(i);
+          int pinotDocId = 
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+          _buffer.putInt(i * Integer.BYTES, pinotDocId);
+        } catch (Exception e) {
+          LOGGER.error("Failed to build doc id mapping during segment load for 
column:{},docID:{},error:{}. Will continue and build mapping on the fly",
 
 Review comment:
   Throw this exception out instead of logging an ERROR. If this step fails, 
JVM will crash when reading the buffer.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to