Jackie-Jiang commented on code in PR #9333:
URL: https://github.com/apache/pinot/pull/9333#discussion_r976058172
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -174,18 +175,46 @@ private void
createTextIndexForColumn(SegmentDirectory.Writer segmentWriter, Col
// segmentDirectory is indicated to us by SegmentDirectoryPaths, we create
lucene index there. There is no
// further need to move around the lucene index directory since it is
created with correct directory structure
// based on segmentVersion.
- try (ForwardIndexReader forwardIndexReader =
LoaderUtils.getForwardIndexReader(segmentWriter, columnMetadata);
- ForwardIndexReaderContext readerContext =
forwardIndexReader.createContext();
- TextIndexCreator textIndexCreator =
indexCreatorProvider.newTextIndexCreator(IndexCreationContext.builder()
-
.withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build().forTextIndex(_fstType,
true))) {
- if (columnMetadata.isSingleValue()) {
- processSVField(segmentWriter, hasDictionary, forwardIndexReader,
readerContext, textIndexCreator, numDocs,
- columnMetadata);
+ try (TextIndexCreator textIndexCreator =
indexCreatorProvider.newTextIndexCreator(
+
IndexCreationContext.builder().withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build()
+ .forTextIndex(_fstType, true))) {
+ boolean forwardIndexDisabled = !segmentWriter.hasIndexFor(columnName,
ColumnIndexType.FORWARD_INDEX);
+ if (forwardIndexDisabled) {
+ try (Dictionary dictionary = LoaderUtils.getDictionary(segmentWriter,
columnMetadata)) {
+ // Create the text index if the dictionary length is 1 as this is
for a default column (i.e. newly added
+ // column). For existing columns it is not possible to create the
text index without forward index
+ Preconditions.checkState(dictionary.length() == 1,
String.format("Creating text index for forward index "
Review Comment:
I don't follow the second approach. For the default column where all the
values are the same for a column, we should create sorted index and a single
entry dictionary. This is the same behavior as generating a new segment with a
column of all the same values.
IMO second approach will actually cause inconsistent behavior because it
will have inverted index with only a single entry, and the segment generated
during the reload or from the raw data without a column will be different.
I strongly suggest generating sorted index + dictionary for the default
column.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]