Jackie-Jiang commented on a change in pull request #8224:
URL: https://github.com/apache/pinot/pull/8224#discussion_r811505110



##########
File path: 
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/ColumnStatistics.java
##########
@@ -98,5 +99,7 @@ default int getMaxRowLengthInBytes() {
 
   int getNumPartitions();
 
+  Map<String, String> getFunctionConfig();

Review comment:
       ```suggestion
     Map<String, String> getPartitionFunctionConfig();
   ```

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
##########
@@ -46,6 +51,15 @@ public String getFunctionName() {
     return _functionName;
   }
 
+  /**
+   * Returns the partition function configuration for the column.
+   *
+   * @return Partition function configuration.
+   */
+  public Map<String, String> getFunctionConfig() {

Review comment:
       Annotate it with `@Nullabel`
   ```suggestion
     @Nullable
     public Map<String, String> getFunctionConfig() {
   ```

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
##########
@@ -126,7 +126,8 @@ private PartitionInfo 
extractPartitionInfoFromSegmentZKMetadataZNRecord(String s
     }
 
     return new 
PartitionInfo(PartitionFunctionFactory.getPartitionFunction(columnPartitionMetadata.getFunctionName(),
-        columnPartitionMetadata.getNumPartitions()), 
columnPartitionMetadata.getPartitions());
+        columnPartitionMetadata.getNumPartitions(), 
columnPartitionMetadata.getFunctionConfig()),
+            columnPartitionMetadata.getPartitions());

Review comment:
       (code format) Can you please setup the [Pinot 
Style](https://docs.pinot.apache.org/developers/developers-and-contributors/code-setup#intellij)
 and reformat the changes?

##########
File path: 
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
##########
@@ -0,0 +1,92 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based 
configured column values.
+ *
+ * "columnPartitionMap": {
+ *   "subject": {
+ *     "functionName": "BoundedColumnValue",
+ *     "functionConfig": {
+ *       "columnValues": "Maths|English|Chemistry"
+ *     }
+ *   }
+ * }
+ * With this partition config on column "subject", partitionId would be 1 for 
Maths, 2 for English and so on.
+ * partitionId would be "0" for all other values which is not configured but 
may occur.
+ */
+public class BoundedColumnValuePartitionFunction implements PartitionFunction {
+  private static final int DEFAULT_PARTITION_ID = 0;
+  private static final String NAME = "BoundedColumnValue";
+  private static final String COLUMN_VALUES = "columnValues";
+  private final Map<String, String> _functionConfig;
+  private final String[] _values;
+  private final int _noOfValues;
+
+  public BoundedColumnValuePartitionFunction(Map<String, String> 
functionConfig) {
+    Preconditions.checkArgument(functionConfig != null && 
functionConfig.size() > 0,
+        "functionConfig should be present, specified", functionConfig);
+    Preconditions.checkState(functionConfig.get(COLUMN_VALUES) != null, 
"columnValues must be configured");
+    _functionConfig = functionConfig;
+    _values = functionConfig.get(COLUMN_VALUES).split("\\|");
+    _noOfValues = _values.length;
+  }
+
+  @Override
+  public int getPartition(Object valueIn) {
+    String value = valueIn instanceof String ? (String) valueIn : 
String.valueOf(valueIn);

Review comment:
       (minor) Same as `value.toString()`

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
##########
@@ -21,20 +21,25 @@
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import java.util.Map;
 import org.apache.pinot.spi.config.BaseJsonConfig;
 
 
 public class ColumnPartitionConfig extends BaseJsonConfig {
   private final String _functionName;
   private final int _numPartitions;
+  private final Map<String, String> _functionConfig;
 
   @JsonCreator
   public ColumnPartitionConfig(@JsonProperty(value = "functionName", required 
= true) String functionName,
-      @JsonProperty(value = "numPartitions", required = true) int 
numPartitions) {
+      @JsonProperty(value = "numPartitions", required = false) int 
numPartitions,

Review comment:
       Suggest not changing this to optional because it is used for segment 
management. Within the `BoundedColumnValuePartitionFunction`, we can validate 
whether it matches the column values provided

##########
File path: 
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
##########
@@ -0,0 +1,92 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based 
configured column values.
+ *
+ * "columnPartitionMap": {
+ *   "subject": {
+ *     "functionName": "BoundedColumnValue",
+ *     "functionConfig": {
+ *       "columnValues": "Maths|English|Chemistry"
+ *     }
+ *   }
+ * }
+ * With this partition config on column "subject", partitionId would be 1 for 
Maths, 2 for English and so on.
+ * partitionId would be "0" for all other values which is not configured but 
may occur.
+ */
+public class BoundedColumnValuePartitionFunction implements PartitionFunction {
+  private static final int DEFAULT_PARTITION_ID = 0;
+  private static final String NAME = "BoundedColumnValue";
+  private static final String COLUMN_VALUES = "columnValues";
+  private final Map<String, String> _functionConfig;
+  private final String[] _values;
+  private final int _noOfValues;
+
+  public BoundedColumnValuePartitionFunction(Map<String, String> 
functionConfig) {
+    Preconditions.checkArgument(functionConfig != null && 
functionConfig.size() > 0,
+        "functionConfig should be present, specified", functionConfig);
+    Preconditions.checkState(functionConfig.get(COLUMN_VALUES) != null, 
"columnValues must be configured");
+    _functionConfig = functionConfig;
+    _values = functionConfig.get(COLUMN_VALUES).split("\\|");

Review comment:
       Suggest making the delimiter configurable to support values with `|`

##########
File path: 
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
##########
@@ -0,0 +1,92 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based 
configured column values.
+ *
+ * "columnPartitionMap": {
+ *   "subject": {
+ *     "functionName": "BoundedColumnValue",
+ *     "functionConfig": {
+ *       "columnValues": "Maths|English|Chemistry"
+ *     }
+ *   }
+ * }
+ * With this partition config on column "subject", partitionId would be 1 for 
Maths, 2 for English and so on.
+ * partitionId would be "0" for all other values which is not configured but 
may occur.
+ */
+public class BoundedColumnValuePartitionFunction implements PartitionFunction {
+  private static final int DEFAULT_PARTITION_ID = 0;
+  private static final String NAME = "BoundedColumnValue";
+  private static final String COLUMN_VALUES = "columnValues";
+  private final Map<String, String> _functionConfig;
+  private final String[] _values;

Review comment:
       Make it a map to accelerate the lookup?

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
##########
@@ -21,20 +21,25 @@
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
+import java.util.Map;
 import org.apache.pinot.spi.config.BaseJsonConfig;
 
 
 public class ColumnPartitionConfig extends BaseJsonConfig {
   private final String _functionName;
   private final int _numPartitions;
+  private final Map<String, String> _functionConfig;
 
   @JsonCreator
   public ColumnPartitionConfig(@JsonProperty(value = "functionName", required 
= true) String functionName,

Review comment:
       Let's add another constructor without `functionConfig`, so that we don't 
need to change the existing usage of this method

##########
File path: 
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
##########
@@ -0,0 +1,92 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import java.util.Map;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based 
configured column values.
+ *
+ * "columnPartitionMap": {
+ *   "subject": {
+ *     "functionName": "BoundedColumnValue",
+ *     "functionConfig": {
+ *       "columnValues": "Maths|English|Chemistry"
+ *     }
+ *   }
+ * }
+ * With this partition config on column "subject", partitionId would be 1 for 
Maths, 2 for English and so on.

Review comment:
       Should we consider putting other values as the last partition in case 
all values are already configured, and no value goes into partition 0?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to