Copilot commented on code in PR #18114:
URL: https://github.com/apache/pinot/pull/18114#discussion_r3043574214
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/VectorSimilarityFilterOperator.java:
##########
@@ -239,7 +283,29 @@ private ImmutableRoaringBitmap executeSearch() {
column);
}
- // 4. Apply exact rerank if requested
+ // 4. Apply threshold refinement if distance threshold is set.
+ // NOTE: This is approximate threshold search. The ANN index generates a
candidate pool
+ // (controlled by vectorMaxCandidates, default topK*10), and only those
candidates are
+ // checked against the threshold. Vectors within the threshold but
outside the ANN candidate
+ // pool will be missed. For exact threshold search, use a table without
a vector index
+ // (ExactVectorScanFilterOperator handles threshold correctly via
brute-force scan).
+ // Increase vectorMaxCandidates to improve recall at the cost of latency.
+ if (_hasThresholdPredicate && _forwardIndexReader == null) {
+ LOGGER.warn("Vector distance threshold was requested on column: {} but
no forward index reader is available. "
+ + "Returning raw ANN top-K results without threshold
refinement.",
+ column);
+ }
+ if (_hasThresholdPredicate && _forwardIndexReader != null &&
annCandidateCount > 0) {
Review Comment:
When `vectorDistanceThreshold` is set but `_forwardIndexReader` is null, the
operator currently logs a warning and returns raw ANN results without applying
the threshold. That violates the stated threshold semantics (results should be
constrained by the threshold). It would be safer to either (a) fail the query
with a clear error, or (b) fall back to an exact scan using the forward index
(if available) rather than silently ignoring the threshold.
```suggestion
throw new IllegalStateException(
"Vector distance threshold was requested on column: " + column
+ " but no forward index reader is available to apply
threshold refinement");
}
if (_hasThresholdPredicate && annCandidateCount > 0) {
```
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/VectorQueryExecutionContext.java:
##########
@@ -0,0 +1,233 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.filter;
+
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.index.creator.VectorBackendCapabilities;
+import org.apache.pinot.segment.spi.index.creator.VectorBackendType;
+import org.apache.pinot.segment.spi.index.creator.VectorExecutionMode;
+
+
+/**
+ * Captures the runtime execution plan for a vector query within a single
segment.
+ *
+ * <p>This context is populated by the query planner (in {@code
FilterPlanNode}) and carried
+ * through to the filter operator so that execution decisions are explicit and
reportable.
+ * It is also surfaced in explain/debug output.</p>
+ *
+ * <p>The context is immutable once created. Use the {@link Builder} to
construct instances.</p>
+ */
+public final class VectorQueryExecutionContext {
+
+ private final VectorExecutionMode _executionMode;
+ private final VectorBackendType _backendType;
+ private final VectorBackendCapabilities _capabilities;
+ private final boolean _exactRerank;
+ private final boolean _hasMetadataFilter;
+ private final boolean _hasThresholdPredicate;
+ private final int _topK;
+ private final int _candidateBudget;
+ private final float _distanceThreshold;
+ @Nullable
+ private final String _fallbackReason;
+
+ private VectorQueryExecutionContext(Builder builder) {
+ _executionMode = builder._executionMode;
+ _backendType = builder._backendType;
+ _capabilities = builder._capabilities;
+ _exactRerank = builder._exactRerank;
+ _hasMetadataFilter = builder._hasMetadataFilter;
+ _hasThresholdPredicate = builder._hasThresholdPredicate;
+ _topK = builder._topK;
+ _candidateBudget = builder._candidateBudget;
+ _distanceThreshold = builder._distanceThreshold;
+ _fallbackReason = builder._fallbackReason;
+ }
+
+ public VectorExecutionMode getExecutionMode() {
+ return _executionMode;
+ }
+
+ public VectorBackendType getBackendType() {
+ return _backendType;
+ }
+
+ public VectorBackendCapabilities getCapabilities() {
+ return _capabilities;
+ }
+
+ public boolean isExactRerank() {
+ return _exactRerank;
+ }
+
+ public boolean hasMetadataFilter() {
+ return _hasMetadataFilter;
+ }
+
+ public boolean hasThresholdPredicate() {
+ return _hasThresholdPredicate;
+ }
+
+ public int getTopK() {
+ return _topK;
+ }
+
+ public int getCandidateBudget() {
+ return _candidateBudget;
+ }
+
+ public float getDistanceThreshold() {
+ return _distanceThreshold;
+ }
+
+ @Nullable
+ public String getFallbackReason() {
+ return _fallbackReason;
+ }
+
+ @Override
+ public String toString() {
+ StringBuilder sb = new StringBuilder("VectorQueryExecutionContext{");
+ sb.append("mode=").append(_executionMode);
+ sb.append(", backend=").append(_backendType);
+ sb.append(", topK=").append(_topK);
+ if (_hasMetadataFilter) {
+ sb.append(", hasFilter=true");
+ }
+ if (_hasThresholdPredicate) {
+ sb.append(", threshold=").append(_distanceThreshold);
+ }
+ if (_exactRerank) {
+ sb.append(", rerank=true, candidateBudget=").append(_candidateBudget);
+ }
+ if (_fallbackReason != null) {
+ sb.append(", fallbackReason=").append(_fallbackReason);
+ }
+ sb.append('}');
+ return sb.toString();
+ }
+
+ /**
+ * Selects the execution mode for a vector query based on query shape.
+ *
+ * <p>This is the centralized decision point. The rules are conservative and
explicit:</p>
+ * <ol>
+ * <li>No vector index -> EXACT_SCAN</li>
+ * <li>Threshold query + filter -> ANN_THRESHOLD_THEN_FILTER</li>
+ * <li>Threshold query (no filter) -> ANN_THRESHOLD_SCAN</li>
+ * <li>Filter + rerank -> ANN_THEN_FILTER_THEN_RERANK</li>
+ * <li>Filter (no rerank) -> ANN_THEN_FILTER</li>
+ * <li>Rerank (no filter) -> ANN_TOP_K_WITH_RERANK</li>
+ * <li>Plain top-K -> ANN_TOP_K</li>
+ * </ol>
+ */
+ public static VectorExecutionMode selectExecutionMode(boolean hasVectorIndex,
+ boolean hasMetadataFilter, boolean hasThresholdPredicate, boolean
exactRerank) {
+ if (!hasVectorIndex) {
+ return VectorExecutionMode.EXACT_SCAN;
+ }
+ if (hasThresholdPredicate && hasMetadataFilter) {
+ return VectorExecutionMode.ANN_THRESHOLD_THEN_FILTER;
+ }
+ if (hasThresholdPredicate) {
+ return VectorExecutionMode.ANN_THRESHOLD_SCAN;
+ }
+ if (hasMetadataFilter && exactRerank) {
+ return VectorExecutionMode.ANN_THEN_FILTER_THEN_RERANK;
+ }
+ if (hasMetadataFilter) {
+ return VectorExecutionMode.ANN_THEN_FILTER;
+ }
+ if (exactRerank) {
+ return VectorExecutionMode.ANN_TOP_K_WITH_RERANK;
+ }
+ return VectorExecutionMode.ANN_TOP_K;
+ }
Review Comment:
The PR description and the new `VectorExecutionMode` enum include
`FILTER_THEN_ANN`, but `selectExecutionMode(...)` has no branch that can ever
return this mode (and it also does not consult backend capabilities). This
means `FILTER_THEN_ANN` is currently unreachable and will never appear in
explain output. Either wire this mode into the selection logic (e.g., when a
backend declares filter-aware search) or remove/park it to avoid confusion and
drift between docs and behavior.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java:
##########
@@ -705,6 +705,30 @@ public static Integer getVectorMaxCandidates(Map<String,
String> queryOptions) {
return checkedParseIntPositive(QueryOptionKey.VECTOR_MAX_CANDIDATES,
maxCandidates);
}
+ /**
+ * Returns the distance threshold for vector radius/threshold search, or
{@code null} if not set.
+ */
+ @Nullable
+ public static Float getVectorDistanceThreshold(Map<String, String>
queryOptions) {
+ String threshold =
queryOptions.get(QueryOptionKey.VECTOR_DISTANCE_THRESHOLD);
+ if (threshold == null) {
+ return null;
+ }
+ try {
+ float value = Float.parseFloat(threshold);
Review Comment:
`getVectorDistanceThreshold()` parses the option with
`Float.parseFloat(threshold)` without trimming whitespace, unlike other numeric
parsers in this class (e.g., `checkedParseDoublePositive` uses `trim()`). This
can cause valid `SET vectorDistanceThreshold = 0.5` variants with surrounding
spaces to fail parsing. Consider using `threshold.trim()` before parsing for
consistency and robustness.
```suggestion
String trimmedThreshold = threshold.trim();
try {
float value = Float.parseFloat(trimmedThreshold);
```
##########
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java:
##########
@@ -371,30 +377,66 @@ private BaseFilterOperator
constructPhysicalOperator(FilterContext filter, int n
* <li>If no vector index exists, fall back to {@link
ExactVectorScanFilterOperator} which
* performs brute-force scan of the forward index.</li>
* </ol>
+ *
+ * @param hasMetadataFilter true if this vector predicate is combined with
metadata filters (AND)
*/
private BaseFilterOperator constructVectorSimilarityOperator(DataSource
dataSource,
- VectorSimilarityPredicate predicate, String column, int numDocs) {
+ VectorSimilarityPredicate predicate, String column, int numDocs, boolean
hasMetadataFilter) {
VectorIndexReader vectorIndex = dataSource.getVectorIndex();
VectorIndexConfig vectorIndexConfig = dataSource.getVectorIndexConfig();
VectorSearchParams searchParams =
VectorSearchParams.fromQueryOptions(_queryContext.getQueryOptions());
if (vectorIndex != null) {
- // ANN index path: pass forward index reader only if rerank is enabled
+ // ANN index path: pass forward index reader if rerank or threshold
search requires exact distances
ForwardIndexReader<?> forwardIndexReader = null;
VectorBackendType backendType =
VectorDistanceUtils.resolveBackendType(vectorIndexConfig);
- if (searchParams.isExactRerank(backendType)) {
+ if (searchParams.isExactRerank(backendType) ||
searchParams.hasDistanceThreshold()) {
forwardIndexReader = dataSource.getForwardIndex();
}
return new VectorSimilarityFilterOperator(vectorIndex, predicate,
numDocs, searchParams, forwardIndexReader,
- vectorIndexConfig);
+ vectorIndexConfig, hasMetadataFilter);
}
Review Comment:
For threshold queries, `forwardIndexReader` is required to compute exact
distances for refinement. Here the code proceeds even if
`dataSource.getForwardIndex()` returns null (passing null into
`VectorSimilarityFilterOperator`), which then causes the threshold to be
skipped at runtime. Consider adding an explicit check: if
`searchParams.hasDistanceThreshold()` and `forwardIndexReader == null`, either
throw a query error or force the exact-scan fallback path so threshold
semantics remain correct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]