somandal commented on code in PR #9333:
URL: https://github.com/apache/pinot/pull/9333#discussion_r974859046
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -174,18 +175,46 @@ private void
createTextIndexForColumn(SegmentDirectory.Writer segmentWriter, Col
// segmentDirectory is indicated to us by SegmentDirectoryPaths, we create
lucene index there. There is no
// further need to move around the lucene index directory since it is
created with correct directory structure
// based on segmentVersion.
- try (ForwardIndexReader forwardIndexReader =
LoaderUtils.getForwardIndexReader(segmentWriter, columnMetadata);
- ForwardIndexReaderContext readerContext =
forwardIndexReader.createContext();
- TextIndexCreator textIndexCreator =
indexCreatorProvider.newTextIndexCreator(IndexCreationContext.builder()
-
.withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build().forTextIndex(_fstType,
true))) {
- if (columnMetadata.isSingleValue()) {
- processSVField(segmentWriter, hasDictionary, forwardIndexReader,
readerContext, textIndexCreator, numDocs,
- columnMetadata);
+ try (TextIndexCreator textIndexCreator =
indexCreatorProvider.newTextIndexCreator(
+
IndexCreationContext.builder().withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build()
+ .forTextIndex(_fstType, true))) {
+ boolean forwardIndexDisabled = !segmentWriter.hasIndexFor(columnName,
ColumnIndexType.FORWARD_INDEX);
+ if (forwardIndexDisabled) {
+ try (Dictionary dictionary = LoaderUtils.getDictionary(segmentWriter,
columnMetadata)) {
+ // Create the text index if the dictionary length is 1 as this is
for a default column (i.e. newly added
+ // column). For existing columns it is not possible to create the
text index without forward index
+ Preconditions.checkState(dictionary.length() == 1,
String.format("Creating text index for forward index "
Review Comment:
So we actually evaluated two approaches for this part:
- Create the forward index, allow it to be marked as sorted. This will skip
creation of the inverted index, and all other index handlers will kick in and
create the index off of the forward index we create. (some index handlers rely
on the forward index to construct the index)
- Our main concern with this approach was that the segments will be
inconsistent, some will have a forward index and others won't. This may give
odd behavior at query time depending on the segment touched (e.g. some queries
go fine with a given filter but don't go through with a slightly different
filter). We wanted to avoid this.
- We had also explored an option to identify such queries on the Broker
or Server planning stage but quickly found that this became very ugly and had
lots of exceptions and was error prone.
- Don't create the forward index but do create the dictionary (just like the
normal segment creation path). Modify the index handlers to create the index
for such a default column if forward index is disabled without relying on the
the forward index. In such cases we expect exactly 1 value anyways. This will
give a consistent experience to users who run queries.
- This has the disadvantage of doing some special handling in the index
handlers for this scenario since the forward index won't be available anymore.
Hope the above explains our thinking about the approaches and why we decided
to go with the second one. Let me know if you'd like to discuss this in more
detail. cc @siddharthteotia
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]