Copilot commented on code in PR #60577:
URL: https://github.com/apache/doris/pull/60577#discussion_r2773377927


##########
regression-test/suites/query_p0/join/test_left_join1.groovy:
##########
@@ -23,6 +23,25 @@ suite("test_left_join1", "query,p0,arrow_flight_sql") {
 
     sql """insert into ${tableName} values (1, 123),(2, 124),(3, 125),(4, 
126);"""
 
+    qt_select """ SELECT
+                          *
+                          FROM
+                  ( SELECT f_key, f_value FROM ${tableName} ) a
+                  LEFT JOIN ( SELECT f_key, f_value FROM ${tableName} ) b ON 
a.f_key = b.f_key
+                  LEFT JOIN (
+                          SELECT
+                  *
+                  FROM
+                  ${tableName}
+                  WHERE
+                  f_key IN ( SELECT f_key FROM ${tableName} WHERE f_key IN ( 
SELECT f_key FROM ${tableName} WHERE f_value > 123 ) )
+                  ) c ON a.f_key = c.f_key
+                  ORDER BY
+                  a.f_key; 
+             """

Review Comment:
   The test should explicitly verify that the LEFT_SEMI_DIRECT_RETURN_OPT 
feature is being used. Consider adding a check in the test (e.g., via EXPLAIN 
or query profile) to confirm that "LeftSemiDirectReturn" is set to true, 
ensuring that the optimization is actually triggered and tested.



##########
be/src/olap/column_predicate.h:
##########
@@ -329,8 +329,10 @@ class ColumnPredicate : public 
std::enable_shared_from_this<ColumnPredicate> {
     void attach_profile_counter(
             int filter_id, std::shared_ptr<RuntimeProfile::Counter> 
predicate_filtered_rows_counter,
             std::shared_ptr<RuntimeProfile::Counter> 
predicate_input_rows_counter,
-            std::shared_ptr<RuntimeProfile::Counter> 
predicate_always_true_rows_counter) {
+            std::shared_ptr<RuntimeProfile::Counter> 
predicate_always_true_rows_counter,
+            const RuntimeFilterSelectivity& rf_selectivity) {
         _runtime_filter_id = filter_id;
+        _rf_selectivity = rf_selectivity;

Review Comment:
   Copying RuntimeFilterSelectivity by value in attach_profile_counter may lead 
to unexpected behavior. The selectivity tracking state (_judge_input_rows, 
_judge_filter_rows, _judge_counter, _always_true, _sampling_frequency) is 
copied, but the two instances will then track independently. This means changes 
to the original RuntimeFilterSelectivity won't be reflected in the 
ColumnPredicate's copy. Consider whether this should be a reference (perhaps 
stored as a pointer) or if the copy semantics are intentional.



##########
be/src/runtime_filter/runtime_filter_wrapper.h:
##########
@@ -139,6 +153,10 @@ class RuntimeFilterWrapper {
     std::shared_ptr<BloomFilterFuncBase> _bloom_filter_func;
     std::shared_ptr<BitmapFilterFuncBase> _bitmap_filter_func;
 
+    // disable always_true logic if detected in filter
+    // to make left_semi_direct_return_opt work correctly
+    bool _disable_always_true_logic = false;

Review Comment:
   The _disable_always_true_logic field should be std::atomic<bool> instead of 
plain bool. While there appears to be a happens-before relationship through the 
signaling mechanism between the producer setting this flag and the consumer 
reading it, using an atomic type would provide clearer thread-safety guarantees 
and prevent potential issues from compiler optimizations or memory reordering. 
The wrapper is explicitly documented as being shared between producer and 
consumer (line 160-161), making thread-safety critical.



##########
be/src/runtime_filter/runtime_filter_consumer.h:
##########
@@ -93,8 +93,10 @@ class RuntimeFilterConsumer : public RuntimeFilter {
               _registration_time(MonotonicMillis()),
               _rf_state(State::NOT_READY) {
         // If bitmap filter is not applied, it will cause the query result to 
be incorrect
+        // local rf must wait until timeout, otherwise it may lead results 
incorrectness, because LEFT_SEMI_DIRECT_RETURN_OPT

Review Comment:
   The comment should clarify the specific scenario. The issue is: when 
LEFT_SEMI_DIRECT_RETURN_OPT is enabled, the probe operator bypasses the hash 
table lookup and directly outputs all probe rows. For this to be correct, the 
local IN filter must be applied to all probe rows before they reach the probe 
operator. If the consumer times out and doesn't wait for the local filter, 
unfiltered rows would be incorrectly output. Consider expanding the comment to 
explain this specific correctness requirement more explicitly.
   ```suggestion
           // When LEFT_SEMI_DIRECT_RETURN_OPT is enabled, the probe operator 
can bypass the hash
           // table lookup and directly return all probe rows. For the query 
result to be correct,
           // the local IN/bitmap runtime filter must have been applied to 
every probe row before
           // it reaches that probe operator. If a local consumer "gives up" 
too early (does not
           // wait for the local filter to arrive and does not wait until 
timeout), unfiltered
           // probe rows would be passed through and returned, leading to 
incorrect query results.
           // Therefore, for certain local filters (e.g. bitmap filters or when 
there is no remote
           // target), we force the consumer to wait until timeout (or until 
the filter is ready).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to