[GitHub] [pinot] siddharthteotia commented on a diff in pull request #9454: ForwardIndexHandler: Change compressionType during segmentReload

GitBox Wed, 28 Sep 2022 21:46:28 -0700


siddharthteotia commented on code in PR #9454:
URL: https://github.com/apache/pinot/pull/9454#discussion_r983059458



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -0,0 +1,273 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.loader;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import java.io.File;
+import java.math.BigDecimal;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.segment.readers.PinotSegmentColumnReader;
+import org.apache.pinot.segment.spi.ColumnMetadata;
+import org.apache.pinot.segment.spi.SegmentMetadata;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.compression.ChunkCompressionType;
+import org.apache.pinot.segment.spi.creator.IndexCreationContext;
+import org.apache.pinot.segment.spi.creator.IndexCreatorProvider;
+import org.apache.pinot.segment.spi.creator.SegmentVersion;
+import org.apache.pinot.segment.spi.index.creator.ForwardIndexCreator;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
+import org.apache.pinot.segment.spi.store.ColumnIndexType;
+import org.apache.pinot.segment.spi.store.SegmentDirectory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * Helper class used by {@link SegmentPreProcessor} to make changes to forward 
index and dictionary configs. Note
+ * that this handler only works for segment versions >= 3.0. Support for 
segment version < 3.0 is not added because
+ * majority of the usecases are in versions >= 3.0 and this avoids adding tech 
debt. The currently supported
+ * operations are:
+ * 1. Change compression on raw SV columns.
+ *
+ *  TODO: Add support for the following:
+ *  1. Change compression for raw MV columns
+ *  2. Enable dictionary
+ *  3. Disable dictionary
+ */
+public class ForwardIndexHandler implements IndexHandler {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(ForwardIndexHandler.class);
+
+  private final SegmentMetadata _segmentMetadata;
+  IndexLoadingConfig _indexLoadingConfig;
+
+  protected enum Operation {
+    // TODO: Add other operations like ENABLE_DICTIONARY, DISABLE_DICTIONARY.
+    CHANGE_RAW_INDEX_COMPRESSION_TYPE,
+  }
+
+  public ForwardIndexHandler(SegmentMetadata segmentMetadata, 
IndexLoadingConfig indexLoadingConfig) {
+    _segmentMetadata = segmentMetadata;
+    _indexLoadingConfig = indexLoadingConfig;
+  }
+
+  @Override
+  public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader)
+      throws Exception {
+    Map<String, Operation> columnOperationMap = 
computeOperation(segmentReader);
+    return !columnOperationMap.isEmpty();
+  }
+
+  @Override
+  public void updateIndices(SegmentDirectory.Writer segmentWriter, 
IndexCreatorProvider indexCreatorProvider)
+      throws Exception {
+    Map<String, Operation> columnOperationMap = 
computeOperation(segmentWriter);
+    if (columnOperationMap.isEmpty()) {
+      return;
+    }
+
+    for (Map.Entry<String, Operation> entry : columnOperationMap.entrySet()) {
+      String column = entry.getKey();
+      Operation operation = entry.getValue();
+
+      switch (operation) {
+        case CHANGE_RAW_INDEX_COMPRESSION_TYPE:
+          rewriteRawForwardIndex(column, segmentWriter, indexCreatorProvider);
+          break;
+        // TODO: Add other operations here.
+        default:
+          throw new IllegalStateException("Unsupported operation for column " 
+ column);
+      }
+    }
+  }
+
+  @VisibleForTesting
+  Map<String, Operation> computeOperation(SegmentDirectory.Reader 
segmentReader)
+      throws Exception {
+    Map<String, Operation> columnOperationMap = new HashMap<>();
+
+    // Does not work for segment versions < V3
+    if (_segmentMetadata.getVersion().compareTo(SegmentVersion.v3) < 0) {
+      return columnOperationMap;
+    }
+
+    // From existing column config.
+    Set<String> existingAllColumns = _segmentMetadata.getAllColumns();
+    Set<String> existingDictColumns =
+        
segmentReader.toSegmentDirectory().getColumnsWithIndex(ColumnIndexType.DICTIONARY);
+    Set<String> existingNoDictColumns = new HashSet<>();
+    for (String column : existingAllColumns) {
+      if (!existingDictColumns.contains(column)) {
+        existingNoDictColumns.add(column);
+      }
+    }
+
+    // From new column config.
+    Set<String> newNoDictColumns = 
_indexLoadingConfig.getNoDictionaryColumns();
+
+    for (String column : existingAllColumns) {

Review Comment:
   We can just iterate over `existingNoDictColumns` imo
   
   We are anyway not supporting (in this PR) enabling / disabling dictionary on 
an existing column on the reload path. So, our assumption is that 
`existingNoDictColumns` and `newNoDictColumns` is exactly the same. 
   
   If `newNoDictColumns` has more columns, it could mean any of the 2 things:
   
   - User added a new column to schema and marked as noDict
   - User put an existing dict column in the schema as noDict
   
   (1) is already supported in `BaseDefaultColumnHandler` and should already be 
handled before reaching here. (2) is not supported yet
   
   So this code should ignore (which you are already doing at the if check in 
line 129)
   
   If `newNoDictColumns` has fewer columns, it could only mean the following
   
   - User marked an existing nodict column in the schema as dict
   
   This is also not supported and we can ignore. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] siddharthteotia commented on a diff in pull request #9454: ForwardIndexHandler: Change compressionType during segmentReload

Reply via email to