Gabriel39 commented on a change in pull request #8393:
URL: https://github.com/apache/incubator-doris/pull/8393#discussion_r830056206



##########
File path: be/src/vec/exec/join/vhash_join_node.cpp
##########
@@ -160,7 +161,7 @@ struct ProcessHashTableProbe {
             : _join_node(join_node),
               _batch_size(batch_size),
               _probe_rows(probe_rows),
-              _build_blocks(join_node->_build_blocks),
+              
_build_blocks(join_node->_shared_hash_table_ctx._shared_structure->_build_blocks),

Review comment:
       I have a question for `process_data_in_hashtable` in this structure. 
Assume that for a right outer join, we have FI1 and FI2 share one hash table. 
If FI1 ended its probing before FI2, FI1 execute `process_data_in_hashtable` 
first. However, if a record (`record1`) doesn't match any record in FI1 but 
match some records in FI2, when FI1 execute `process_data_in_hashtable`, FI1 
will assume `record1` doesn't match any and output NULL and record1 (line 549). 
After that, when FI2's probing continue, `record1` matches some records 
actually. So finally we got a wrong answer because FI1 output an unexpected 
row. Will this happen? Pls point out my mistake if I misunderstood this 
process. Thx

##########
File path: be/src/runtime/shared_hash_table.cpp
##########
@@ -0,0 +1,154 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "runtime/shared_hash_table.h"
+
+#include <mutex>
+#include <string>
+#include "runtime/exec_env.h"
+#include "runtime/mem_tracker.h"
+#include "runtime/plan_fragment_executor.h"
+#include "util/time.h"
+
+namespace doris {
+void SharedHashTableVal::get_callback(vectorized::shared_hash_table_operator* 
hash_table_accessor,
+                                        vectorized::shared_hash_table_barrier* 
hash_table_barrier) {
+    *hash_table_accessor = 
std::bind(&SharedHashTableVal::shared_hash_table_operate,
+                                     this, std::placeholders::_1, 
std::placeholders::_2);
+
+    *hash_table_barrier = 
std::bind(&SharedHashTableVal::shared_hash_table_barrier, this);
+}
+
+bool SharedHashTableVal::shared_hash_table_operate(bool is_leader, 
std::shared_ptr<vectorized::SharedStructure>& shared_structure) {
+    std::unique_lock<std::mutex> unique_lock(_mutex);
+    if (is_leader) {
+        // set the hash table variants
+        _shared_structure = shared_structure;
+        // wakeup follower.
+        _condition_setup.notify_all();
+    } else {
+        // get the hash table variants
+        if (!_shared_structure) {
+            // weakup to handle cancel event.
+            if (_condition_setup.wait_for(unique_lock, 
std::chrono::milliseconds(::doris::config::thrift_rpc_timeout_ms)) == 
std::cv_status::timeout) {
+                return false;
+            }
+        }
+        shared_structure = _shared_structure;
+    }
+    return true;
+}
+
+bool SharedHashTableVal::shared_hash_table_barrier() {
+    std::unique_lock<std::mutex> unique_lock(_mutex);
+    _sharers_count--;
+    if (_sharers_count == 0) {
+         _condition_barrier.notify_all();
+    } else {
+        while (_sharers_count > 0) {
+            if (_condition_barrier.wait_for(unique_lock, 
std::chrono::milliseconds(::doris::config::thrift_rpc_timeout_ms)) == 
std::cv_status::timeout) {

Review comment:
       how about to add a new config build_shared_hash_table_timeout instead of 
thrift_rpc_timeout_ms?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to