Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400372543
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
##########
@@ -169,5 +178,49 @@ public void close()
throws IOException {
_indexReader.close();
_indexDirectory.close();
+ _docIdReaderWriter.close();
+ }
+
+ private class DocIdReaderWriter implements Closeable {
+ private PinotDataBuffer _buffer;
+
+ DocIdReaderWriter(File segmentIndexDir, String column, int numDocs) throws
Exception {
+ int length = Integer.BYTES * numDocs;
+ File docIdMappingFile = new
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+ column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+ // For newly added segments, this file will not exist.
+ // For segment refresh, segment reload and server restart, file will
exist,
+ // but we don't know if we are here for refresh v/s reload v/s restart.
+ // In case of refresh, we have to build the mapping again, but in case of
+ // reload and restart, we don't. Also, reload has a sub-case where this
text index
+ // was indeed created during reload (user enabled on existing or newly
added column).
+ // Since there is no way to distinguish why we are here, we build the
mapping again
+ // regardless.
+ // TODO: see if we can prefetch the pages
+ _buffer =
+ PinotDataBuffer.mapFile(docIdMappingFile, false, 0, length,
ByteOrder.BIG_ENDIAN, getClass().getSimpleName());
+ }
+
+ public void buildDocIdMapping(int numDocs) {
+ for (int i = 0; i < numDocs; i++) {
+ try {
+ Document document = _indexSearcher.doc(i);
+ int pinotDocId =
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+ _buffer.putInt(i * Integer.BYTES, pinotDocId);
+ } catch (Exception e) {
+ LOGGER.error("Failed to build doc id mapping during segment load for
column:{},docID:{},error:{}. Will continue and build mapping on the fly",
Review comment:
Throw this exception out instead of logging an ERROR. If this step fails,
JVM will crash when reading the buffer.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]