xiangfu0 commented on code in PR #18446:
URL: https://github.com/apache/pinot/pull/18446#discussion_r3211631223


##########
pinot-common/src/main/java/org/apache/pinot/common/partition/function/BoundedColumnValuePartitionFunction.java:
##########
@@ -16,15 +16,18 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.pinot.segment.spi.partition;
+package org.apache.pinot.common.partition.function;
 
 import com.google.common.base.Preconditions;
 import java.util.Map;
 import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.segment.spi.partition.PartitionFunction;
+import org.apache.pinot.segment.spi.partition.PartitionIntNormalizer;
+import org.apache.pinot.spi.annotations.PartitionFunctionType;
 
 
 /**
- * Implementation of {@link PartitionFunction} which partitions based 
configured column values.
+ * Implementation of {@link PartitionFunction} which partitions based on 
configured column values.

Review Comment:
   Done — converted all new Javadoc to JEP 467 `///` markdown syntax in commit 
31ce6ba and again touched up here.



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java:
##########
@@ -57,4 +59,19 @@ public interface PartitionFunction extends Serializable {
   default Map<String, String> getFunctionConfig() {
     return null;
   }
+
+  /**
+   * Reports the {@link PartitionIntNormalizer} that drives this partition 
function's int-to-id
+   * mapping. The built-in implementations now apply the named normalizer 
directly, so the value is
+   * authoritative — recomputing a partition via
+   * {@code 
PartitionIntNormalizer.valueOf(getPartitionIdNormalizer()).getPartitionId(rawHash,
 numPartitions)}
+   * yields the same result as {@link #getPartition(String)} for the same raw 
hash input.
+   *
+   * <p>Used by the framework for identity / staleness matching between 
config-side and segment-side
+   * partition metadata. Each implementation must declare its own value — 
there is intentionally no
+   * default. Plug-ins whose output is already in {@code [0, numPartitions)} 
should return
+   * {@link PartitionIntNormalizer#POSITIVE_MODULO} (a no-op label).
+   */
+  @JsonIgnore

Review Comment:
   Done — `@JsonIgnore` removed and the return type is now 
`PartitionIdNormalizer` (the enum) instead of `String`. The 
`testBasicProperties` JSON-shape assertion was updated to expect the new 
`partitionIdNormalizer` field.



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java:
##########
@@ -18,86 +18,118 @@
  */
 package org.apache.pinot.segment.spi.partition;
 
+import com.google.common.base.Preconditions;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.Modifier;
+import java.util.Collections;
 import java.util.HashMap;
+import java.util.Locale;
 import java.util.Map;
 import javax.annotation.Nullable;
 import org.apache.pinot.segment.spi.partition.metadata.ColumnPartitionMetadata;
+import org.apache.pinot.spi.annotations.PartitionFunctionType;
 import org.apache.pinot.spi.config.table.ColumnPartitionConfig;
+import org.apache.pinot.spi.utils.PinotReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 /**
- * Factory to build instances of {@link PartitionFunction}.
+ * Dynamic registry for {@link PartitionFunction} implementations.
+ *
+ * <p>Discovery is driven by classpath scanning for classes annotated with
+ * {@link PartitionFunctionType}. Annotated classes must:
+ * <ul>
+ *   <li>be public and implement {@link PartitionFunction}</li>
+ *   <li>live under a package matching the regex {@code 
.*\.partition\.function\..*}
+ *       (e.g. {@code org.apache.pinot.common.partition.function} or any 
plugin package
+ *       that follows the same convention)</li>
+ *   <li>expose a public constructor with signature
+ *       {@code (int numPartitions, Map<String, String> functionConfig)}</li>
+ * </ul>
+ *
+ * <p>The static block scans the classpath once and builds an immutable
+ * (canonicalized name → constructor) map. Instances are created on demand by
+ * {@link #getPartitionFunction(String, int, Map)}.
+ *
+ * <p>To force eager initialization (e.g. so the scan happens before the first 
segment
+ * is read), call {@link #init()} from broker/server/controller startup.
  */
 public class PartitionFunctionFactory {
-  // Enum for various partition functions to be added.
-  public enum PartitionFunctionType {
-    Modulo, Murmur, Murmur2, Murmur3, Fnv, ByteArray, HashCode, 
BoundedColumnValue;
-    // Add more functions here.
+  private PartitionFunctionFactory() {
+  }
 
-    private static final Map<String, PartitionFunctionType> VALUE_MAP = new 
HashMap<>();
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(PartitionFunctionFactory.class);
+  private static final String SCAN_REGEX = ".*\\.partition\\.function\\..*";
 
-    static {
-      for (PartitionFunctionType functionType : 
PartitionFunctionType.values()) {
-        VALUE_MAP.put(functionType.name().toLowerCase(), functionType);
-      }
-    }
-
-    public static PartitionFunctionType fromString(String name) {
-      PartitionFunctionType functionType = VALUE_MAP.get(name.toLowerCase());
+  private static final Map<String, Constructor<? extends PartitionFunction>> 
REGISTRY;
 
-      if (functionType == null) {
-        throw new IllegalArgumentException("No enum constant for: " + name);
+  static {
+    long startTimeMs = System.currentTimeMillis();
+    Map<String, Constructor<? extends PartitionFunction>> registry = new 
HashMap<>();
+    for (Class<?> clazz : 
PinotReflectionUtils.getClassesThroughReflection(SCAN_REGEX, 
PartitionFunctionType.class)) {
+      if (!Modifier.isPublic(clazz.getModifiers()) || 
!PartitionFunction.class.isAssignableFrom(clazz)) {

Review Comment:
   Acknowledged — clarified the Javadoc to state that scanning is rooted at 
`org.apache.pinot.*`. Per Pinot's plugin convention, third-party plugins live 
under that tree, so the documented behavior now matches the implementation. 
(See follow-up commit.)



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java:
##########
@@ -108,4 +140,8 @@ public static PartitionFunction 
getPartitionFunction(ColumnPartitionConfig confi
   public static PartitionFunction getPartitionFunction(ColumnPartitionMetadata 
metadata) {
     return getPartitionFunction(metadata.getFunctionName(), 
metadata.getNumPartitions(), metadata.getFunctionConfig());
   }
+
+  private static String canonicalize(String name) {
+    return name.toLowerCase(Locale.ROOT);

Review Comment:
   Done — Javadoc updated to drop the "stripping underscores" claim. Lookup is 
just lower-case canonicalization.



##########
pinot-spi/src/main/java/org/apache/pinot/spi/annotations/PartitionFunctionType.java:
##########
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.annotations;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+
+/**
+ * Marker annotation for plug-in {@code PartitionFunction} implementations.
+ *
+ * <p>Classes annotated with this annotation are auto-discovered at startup by
+ * {@code PartitionFunctionFactory} via classpath scanning. Each annotated 
class must:
+ * <ul>
+ *   <li>Implement {@code 
org.apache.pinot.segment.spi.partition.PartitionFunction}</li>
+ *   <li>Be public</li>
+ *   <li>Live under a package matching {@code .*\.partition\.function\..*} 
(e.g.
+ *       {@code org.apache.pinot.common.partition.function} or any plugin 
package
+ *       that follows the same convention)</li>

Review Comment:
   Done — annotation-based scan is no longer the only entry point. The factory 
now scans for `PartitionFunction` subtypes and falls back to `getName()` when 
the annotation is absent (see Jackie's review thread on the same point). 
Existing `PartitionFunction` impls without the annotation work.



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java:
##########
@@ -57,4 +59,19 @@ public interface PartitionFunction extends Serializable {
   default Map<String, String> getFunctionConfig() {
     return null;
   }
+
+  /**
+   * Reports the {@link PartitionIntNormalizer} that drives this partition 
function's int-to-id
+   * mapping. The built-in implementations now apply the named normalizer 
directly, so the value is
+   * authoritative — recomputing a partition via
+   * {@code 
PartitionIntNormalizer.valueOf(getPartitionIdNormalizer()).getPartitionId(rawHash,
 numPartitions)}
+   * yields the same result as {@link #getPartition(String)} for the same raw 
hash input.
+   *
+   * <p>Used by the framework for identity / staleness matching between 
config-side and segment-side
+   * partition metadata. Each implementation must declare its own value — 
there is intentionally no
+   * default. Plug-ins whose output is already in {@code [0, numPartitions)} 
should return
+   * {@link PartitionIntNormalizer#POSITIVE_MODULO} (a no-op label).
+   */
+  @JsonIgnore
+  String getPartitionIdNormalizer();

Review Comment:
   Resolved by Jackie's request that every implementation declare its own 
normalizer. Method stays abstract; built-ins all override. Existing third-party 
impls without the annotation are still discoverable via the new `getName()` 
fallback in the factory, but they too must implement the abstract method 
(acceptable break per the PR's `backward-incompat` label).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to