xiangfu0 opened a new pull request, #18114: URL: https://github.com/apache/pinot/pull/18114
## Summary - **Backend capability model**: `VectorBackendCapabilities` declares 5 query-time capabilities per backend (topKAnn, filterAwareSearch, approximateRadius, exactRerank, runtimeSearchParams), wired into `VectorBackendType.getCapabilities()` - **Execution mode enum**: `VectorExecutionMode` defines 8 explicit modes (ANN_TOP_K, ANN_TOP_K_WITH_RERANK, ANN_THEN_FILTER, ANN_THEN_FILTER_THEN_RERANK, FILTER_THEN_ANN, ANN_THRESHOLD_SCAN, ANN_THRESHOLD_THEN_FILTER, EXACT_SCAN) with centralized selection logic - **Filtered ANN semantics**: `FilterPlanNode` detects AND(VECTOR_SIMILARITY, ...) patterns and over-fetches ANN candidates (2x) to compensate for post-filter loss; execution mode is explicit in explain output - **Threshold/radius search**: New `vectorDistanceThreshold` query option enables distance-based filtering via ANN candidate generation + exact threshold refinement from forward index; works in both indexed and exact-scan fallback paths - **Compound retrieval**: Filter + top-K, filter + threshold, and top-K + threshold patterns all wired with correct execution mode reporting - **Explain/debug**: Execution mode now visible in both human-readable and structured explain output for all vector queries ## Design See `docs/design/vector-backends-phase3.md` for the full design note covering execution modes, capability model, mode selection rules, query options, and limitations. ## Backward Compatibility All existing VECTOR_SIMILARITY queries work unchanged. No SQL, schema, table config, or wire protocol changes. The new `vectorDistanceThreshold` query option is purely additive. ## Test plan - [x] `VectorBackendCapabilitiesTest` — capability model for all backends (8 tests) - [x] `VectorExecutionModeTest` — mode properties and flag consistency (6 tests) - [x] `VectorBackendTypeTest` — existing + new capability integration (8 tests) - [x] `VectorQueryExecutionContextTest` — mode selection logic for all query shapes (16 tests) - [x] `VectorSearchParamsTest` — threshold parsing, negative thresholds for dot-product (19 tests) - [x] `VectorSimilarityFilterOperatorTest` — filtered ANN over-fetch, threshold refinement, execution mode reporting (21 tests) - [x] `VectorCompoundQueryTest` — compound patterns: filter+topK, filter+threshold, backward compat (9 tests) - [x] Checkstyle, spotless, and license checks pass on all modified modules 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
