Copilot commented on code in PR #64243:
URL: https://github.com/apache/doris/pull/64243#discussion_r3379763912


##########
be/src/storage/segment/segment_iterator.cpp:
##########
@@ -1056,15 +1056,20 @@ Status SegmentIterator::_apply_ann_topn_predicate() {
 
     size_t pre_size = _row_bitmap.cardinality();
     size_t rows_of_segment = _segment->num_rows();
-    if (static_cast<double>(pre_size) < static_cast<double>(rows_of_segment) * 
0.3) {
+    DORIS_CHECK(_opts.runtime_state != nullptr);
+    const auto user_params = _opts.runtime_state->get_vector_search_params();
+    if (user_params.should_fallback_ann_index_by_small_candidate(pre_size, 
rows_of_segment)) {

Review Comment:
   `StorageReadOptions::runtime_state` is explicitly nullable 
(iterators.h:132-134), and this method already handles a null runtime_state 
later when computing `enable_ann_index_result_cache`. The new 
`DORIS_CHECK(_opts.runtime_state != nullptr)` will hard-abort BE in any call 
path that sets `ann_topn_runtime` but not `runtime_state` (or in future 
refactors/tests). Prefer falling back to default `VectorSearchUserParams` when 
runtime_state is absent so the previous default behavior (0.3 ratio, 0 absolute 
threshold) is preserved without crashing.



##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -3584,6 +3588,38 @@ public boolean isEnableESParallelScroll() {
                     "IVF index nprobe parameter, controls the number of 
clusters to search"})
     public int ivfNprobe = 32;
 
+    @VarAttrDef.VarAttr(name = ANN_INDEX_CANDIDATE_ROWS_THRESHOLD, needForward 
= true,
+            checker = "checkAnnIndexCandidateRowsThreshold",
+            description = {"Skip ANN index when candidate rows before ANN 
search are less "
+                    + "than this threshold. 0 disables the absolute row 
threshold",
+                    "Skip ANN index when candidate rows before ANN search are 
less "
+                            + "than this threshold. 0 disables the absolute 
row threshold"})
+    public long annIndexCandidateRowsThreshold = 0;
+
+    @VarAttrDef.VarAttr(name = ANN_INDEX_CANDIDATE_ROWS_PERCENT_THRESHOLD, 
needForward = true,
+            checker = "checkAnnIndexCandidateRowsPercentThreshold",
+            description = {"Skip ANN index when candidate row ratio before ANN 
search is less "
+                    + "than this threshold",
+                    "Skip ANN index when candidate row ratio before ANN search 
is less "
+                            + "than this threshold"})
+    public double annIndexCandidateRowsPercentThreshold = 0.3;
+
+    public void checkAnnIndexCandidateRowsThreshold(String value) {
+        long threshold = Long.parseLong(value);
+        if (threshold < 0) {
+            throw new InvalidParameterException(
+                    ANN_INDEX_CANDIDATE_ROWS_THRESHOLD + " should be greater 
than or equal to 0");
+        }
+    }
+
+    public void checkAnnIndexCandidateRowsPercentThreshold(String value) {
+        double threshold = Double.parseDouble(value);
+        if (threshold < 0 || threshold > 1) {
+            throw new InvalidParameterException(
+                    ANN_INDEX_CANDIDATE_ROWS_PERCENT_THRESHOLD + " should be 
between 0 and 1");
+        }
+    }

Review Comment:
   `checkAnnIndexCandidateRowsPercentThreshold` currently allows `NaN` (e.g. 
`SET ann_index_candidate_rows_percent_threshold = 'NaN'`) because `NaN < 0` and 
`NaN > 1` are both false. That silently disables the percent-threshold logic in 
BE (`NaN > 0` is also false) and makes the variable behave unexpectedly. Reject 
non-finite values explicitly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to