Copilot commented on code in PR #18114:
URL: https://github.com/apache/pinot/pull/18114#discussion_r3043574214


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/VectorSimilarityFilterOperator.java:
##########
@@ -239,7 +283,29 @@ private ImmutableRoaringBitmap executeSearch() {
             column);
       }
 
-      // 4. Apply exact rerank if requested
+      // 4. Apply threshold refinement if distance threshold is set.
+      // NOTE: This is approximate threshold search. The ANN index generates a 
candidate pool
+      // (controlled by vectorMaxCandidates, default topK*10), and only those 
candidates are
+      // checked against the threshold. Vectors within the threshold but 
outside the ANN candidate
+      // pool will be missed. For exact threshold search, use a table without 
a vector index
+      // (ExactVectorScanFilterOperator handles threshold correctly via 
brute-force scan).
+      // Increase vectorMaxCandidates to improve recall at the cost of latency.
+      if (_hasThresholdPredicate && _forwardIndexReader == null) {
+        LOGGER.warn("Vector distance threshold was requested on column: {} but 
no forward index reader is available. "
+                + "Returning raw ANN top-K results without threshold 
refinement.",
+            column);
+      }
+      if (_hasThresholdPredicate && _forwardIndexReader != null && 
annCandidateCount > 0) {

Review Comment:
   When `vectorDistanceThreshold` is set but `_forwardIndexReader` is null, the 
operator currently logs a warning and returns raw ANN results without applying 
the threshold. That violates the stated threshold semantics (results should be 
constrained by the threshold). It would be safer to either (a) fail the query 
with a clear error, or (b) fall back to an exact scan using the forward index 
(if available) rather than silently ignoring the threshold.
   ```suggestion
           throw new IllegalStateException(
               "Vector distance threshold was requested on column: " + column
                   + " but no forward index reader is available to apply 
threshold refinement");
         }
         if (_hasThresholdPredicate && annCandidateCount > 0) {
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/VectorQueryExecutionContext.java:
##########
@@ -0,0 +1,233 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.filter;
+
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.index.creator.VectorBackendCapabilities;
+import org.apache.pinot.segment.spi.index.creator.VectorBackendType;
+import org.apache.pinot.segment.spi.index.creator.VectorExecutionMode;
+
+
+/**
+ * Captures the runtime execution plan for a vector query within a single 
segment.
+ *
+ * <p>This context is populated by the query planner (in {@code 
FilterPlanNode}) and carried
+ * through to the filter operator so that execution decisions are explicit and 
reportable.
+ * It is also surfaced in explain/debug output.</p>
+ *
+ * <p>The context is immutable once created. Use the {@link Builder} to 
construct instances.</p>
+ */
+public final class VectorQueryExecutionContext {
+
+  private final VectorExecutionMode _executionMode;
+  private final VectorBackendType _backendType;
+  private final VectorBackendCapabilities _capabilities;
+  private final boolean _exactRerank;
+  private final boolean _hasMetadataFilter;
+  private final boolean _hasThresholdPredicate;
+  private final int _topK;
+  private final int _candidateBudget;
+  private final float _distanceThreshold;
+  @Nullable
+  private final String _fallbackReason;
+
+  private VectorQueryExecutionContext(Builder builder) {
+    _executionMode = builder._executionMode;
+    _backendType = builder._backendType;
+    _capabilities = builder._capabilities;
+    _exactRerank = builder._exactRerank;
+    _hasMetadataFilter = builder._hasMetadataFilter;
+    _hasThresholdPredicate = builder._hasThresholdPredicate;
+    _topK = builder._topK;
+    _candidateBudget = builder._candidateBudget;
+    _distanceThreshold = builder._distanceThreshold;
+    _fallbackReason = builder._fallbackReason;
+  }
+
+  public VectorExecutionMode getExecutionMode() {
+    return _executionMode;
+  }
+
+  public VectorBackendType getBackendType() {
+    return _backendType;
+  }
+
+  public VectorBackendCapabilities getCapabilities() {
+    return _capabilities;
+  }
+
+  public boolean isExactRerank() {
+    return _exactRerank;
+  }
+
+  public boolean hasMetadataFilter() {
+    return _hasMetadataFilter;
+  }
+
+  public boolean hasThresholdPredicate() {
+    return _hasThresholdPredicate;
+  }
+
+  public int getTopK() {
+    return _topK;
+  }
+
+  public int getCandidateBudget() {
+    return _candidateBudget;
+  }
+
+  public float getDistanceThreshold() {
+    return _distanceThreshold;
+  }
+
+  @Nullable
+  public String getFallbackReason() {
+    return _fallbackReason;
+  }
+
+  @Override
+  public String toString() {
+    StringBuilder sb = new StringBuilder("VectorQueryExecutionContext{");
+    sb.append("mode=").append(_executionMode);
+    sb.append(", backend=").append(_backendType);
+    sb.append(", topK=").append(_topK);
+    if (_hasMetadataFilter) {
+      sb.append(", hasFilter=true");
+    }
+    if (_hasThresholdPredicate) {
+      sb.append(", threshold=").append(_distanceThreshold);
+    }
+    if (_exactRerank) {
+      sb.append(", rerank=true, candidateBudget=").append(_candidateBudget);
+    }
+    if (_fallbackReason != null) {
+      sb.append(", fallbackReason=").append(_fallbackReason);
+    }
+    sb.append('}');
+    return sb.toString();
+  }
+
+  /**
+   * Selects the execution mode for a vector query based on query shape.
+   *
+   * <p>This is the centralized decision point. The rules are conservative and 
explicit:</p>
+   * <ol>
+   *   <li>No vector index -> EXACT_SCAN</li>
+   *   <li>Threshold query + filter -> ANN_THRESHOLD_THEN_FILTER</li>
+   *   <li>Threshold query (no filter) -> ANN_THRESHOLD_SCAN</li>
+   *   <li>Filter + rerank -> ANN_THEN_FILTER_THEN_RERANK</li>
+   *   <li>Filter (no rerank) -> ANN_THEN_FILTER</li>
+   *   <li>Rerank (no filter) -> ANN_TOP_K_WITH_RERANK</li>
+   *   <li>Plain top-K -> ANN_TOP_K</li>
+   * </ol>
+   */
+  public static VectorExecutionMode selectExecutionMode(boolean hasVectorIndex,
+      boolean hasMetadataFilter, boolean hasThresholdPredicate, boolean 
exactRerank) {
+    if (!hasVectorIndex) {
+      return VectorExecutionMode.EXACT_SCAN;
+    }
+    if (hasThresholdPredicate && hasMetadataFilter) {
+      return VectorExecutionMode.ANN_THRESHOLD_THEN_FILTER;
+    }
+    if (hasThresholdPredicate) {
+      return VectorExecutionMode.ANN_THRESHOLD_SCAN;
+    }
+    if (hasMetadataFilter && exactRerank) {
+      return VectorExecutionMode.ANN_THEN_FILTER_THEN_RERANK;
+    }
+    if (hasMetadataFilter) {
+      return VectorExecutionMode.ANN_THEN_FILTER;
+    }
+    if (exactRerank) {
+      return VectorExecutionMode.ANN_TOP_K_WITH_RERANK;
+    }
+    return VectorExecutionMode.ANN_TOP_K;
+  }

Review Comment:
   The PR description and the new `VectorExecutionMode` enum include 
`FILTER_THEN_ANN`, but `selectExecutionMode(...)` has no branch that can ever 
return this mode (and it also does not consult backend capabilities). This 
means `FILTER_THEN_ANN` is currently unreachable and will never appear in 
explain output. Either wire this mode into the selection logic (e.g., when a 
backend declares filter-aware search) or remove/park it to avoid confusion and 
drift between docs and behavior.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java:
##########
@@ -705,6 +705,30 @@ public static Integer getVectorMaxCandidates(Map<String, 
String> queryOptions) {
     return checkedParseIntPositive(QueryOptionKey.VECTOR_MAX_CANDIDATES, 
maxCandidates);
   }
 
+  /**
+   * Returns the distance threshold for vector radius/threshold search, or 
{@code null} if not set.
+   */
+  @Nullable
+  public static Float getVectorDistanceThreshold(Map<String, String> 
queryOptions) {
+    String threshold = 
queryOptions.get(QueryOptionKey.VECTOR_DISTANCE_THRESHOLD);
+    if (threshold == null) {
+      return null;
+    }
+    try {
+      float value = Float.parseFloat(threshold);

Review Comment:
   `getVectorDistanceThreshold()` parses the option with 
`Float.parseFloat(threshold)` without trimming whitespace, unlike other numeric 
parsers in this class (e.g., `checkedParseDoublePositive` uses `trim()`). This 
can cause valid `SET vectorDistanceThreshold = 0.5` variants with surrounding 
spaces to fail parsing. Consider using `threshold.trim()` before parsing for 
consistency and robustness.
   ```suggestion
       String trimmedThreshold = threshold.trim();
       try {
         float value = Float.parseFloat(trimmedThreshold);
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java:
##########
@@ -371,30 +377,66 @@ private BaseFilterOperator 
constructPhysicalOperator(FilterContext filter, int n
    *   <li>If no vector index exists, fall back to {@link 
ExactVectorScanFilterOperator} which
    *       performs brute-force scan of the forward index.</li>
    * </ol>
+   *
+   * @param hasMetadataFilter true if this vector predicate is combined with 
metadata filters (AND)
    */
   private BaseFilterOperator constructVectorSimilarityOperator(DataSource 
dataSource,
-      VectorSimilarityPredicate predicate, String column, int numDocs) {
+      VectorSimilarityPredicate predicate, String column, int numDocs, boolean 
hasMetadataFilter) {
     VectorIndexReader vectorIndex = dataSource.getVectorIndex();
     VectorIndexConfig vectorIndexConfig = dataSource.getVectorIndexConfig();
     VectorSearchParams searchParams = 
VectorSearchParams.fromQueryOptions(_queryContext.getQueryOptions());
 
     if (vectorIndex != null) {
-      // ANN index path: pass forward index reader only if rerank is enabled
+      // ANN index path: pass forward index reader if rerank or threshold 
search requires exact distances
       ForwardIndexReader<?> forwardIndexReader = null;
       VectorBackendType backendType = 
VectorDistanceUtils.resolveBackendType(vectorIndexConfig);
-      if (searchParams.isExactRerank(backendType)) {
+      if (searchParams.isExactRerank(backendType) || 
searchParams.hasDistanceThreshold()) {
         forwardIndexReader = dataSource.getForwardIndex();
       }
       return new VectorSimilarityFilterOperator(vectorIndex, predicate, 
numDocs, searchParams, forwardIndexReader,
-          vectorIndexConfig);
+          vectorIndexConfig, hasMetadataFilter);
     }

Review Comment:
   For threshold queries, `forwardIndexReader` is required to compute exact 
distances for refinement. Here the code proceeds even if 
`dataSource.getForwardIndex()` returns null (passing null into 
`VectorSimilarityFilterOperator`), which then causes the threshold to be 
skipped at runtime. Consider adding an explicit check: if 
`searchParams.hasDistanceThreshold()` and `forwardIndexReader == null`, either 
throw a query error or force the exact-scan fallback path so threshold 
semantics remain correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to