Copilot commented on code in PR #62056:
URL: https://github.com/apache/doris/pull/62056#discussion_r3027797950
##########
be/src/exec/operator/hashjoin_build_sink.cpp:
##########
@@ -237,7 +237,18 @@ Status HashJoinBuildSinkLocalState::close(RuntimeState*
state, Status exec_statu
if (p._use_shared_hash_table) {
std::unique_lock lock(p._mutex);
- p._signaled = true;
+ // Only signal non-builder tasks when the builder actually built
the hash table.
+ // When the builder is terminated (woken up early because the
probe side finished
+ // first), it never called process_build_block() so the hash table
variant is still
+ // monostate. Setting _signaled=true in that case would cause
non-builder tasks to
+ // enter std::visit on monostate and crash with "Hash table type
mismatch".
+ //
+ // _terminated is reliably true here when the task was woken up
early, because
+ // operator terminate() is called in the execute() Defer
(pipeline_task.cpp:510-512)
+ // BEFORE close() runs.
Review Comment:
The new comment ties the correctness of the `_terminated` check to specific
line numbers in `pipeline_task.cpp` ("pipeline_task.cpp:510-512"). Those line
references are brittle and will become stale as the file evolves; consider
referencing the function/logic instead (e.g., “in PipelineTask::execute()
Defer”) or linking to a more stable invariant/flag rather than hard-coded line
numbers.
```suggestion
// operator terminate() is called from the execute() Defer in
PipelineTask
// before close() is invoked.
```
##########
be/src/exec/operator/hashjoin_build_sink.cpp:
##########
@@ -840,10 +851,18 @@ Status HashJoinBuildSinkOperatorX::sink(RuntimeState*
state, Block* in_block, bo
RETURN_IF_ERROR(local_state.build_asof_index(*local_state._shared_state->build_block));
local_state.init_short_circuit_for_probe();
} else if (!local_state._should_build_hash_table) {
- // the instance which is not build hash table, it's should wait the
signal of hash table build finished.
- // but if it's running and signaled == false, maybe the source
operator have closed caused by some short circuit
- // return eof will make task marked as wake_up_early
- // todo: remove signaled after we can guarantee that wake up eraly is
always set accurately
+ // The non-builder instance waits for the builder (task 0) to finish
building the hash table.
+ // If _signaled is false, either the builder hasn't finished yet, or
the builder was
+ // terminated (woken up early) without building the hash table — in
both cases, return EOF.
+ //
+ // The close() Defer in the builder conditionally sets _signaled=true
ONLY when the builder
+ // was NOT terminated (i.e., the hash table was actually built). When
the builder is
+ // terminated, _signaled stays false, so non-builders always hit this
guard and return EOF
+ // safely — never reaching the std::visit on an uninitialized
(monostate) hash table.
+ //
+ // _terminated is also checked as a secondary guard. Note that
_terminated is always false
+ // inside sink() during execute() (it's set in the Defer AFTER
execute() returns), so it
+ // provides no protection against the race. The _signaled check is the
real guard.
if (!_signaled || local_state._terminated) {
return Status::Error<ErrorCode::END_OF_FILE>("source have closed");
Review Comment:
Typo/grammar in the END_OF_FILE message: "source have closed" should be
"source has closed" (or a more specific message about waiting for the shared
hash table). This string is user-visible in logs/errors and is currently
ungrammatical.
```suggestion
return Status::Error<ErrorCode::END_OF_FILE>("source has
closed");
```
##########
be/src/exec/operator/hashjoin_build_sink.cpp:
##########
@@ -840,10 +851,18 @@ Status HashJoinBuildSinkOperatorX::sink(RuntimeState*
state, Block* in_block, bo
RETURN_IF_ERROR(local_state.build_asof_index(*local_state._shared_state->build_block));
local_state.init_short_circuit_for_probe();
} else if (!local_state._should_build_hash_table) {
- // the instance which is not build hash table, it's should wait the
signal of hash table build finished.
- // but if it's running and signaled == false, maybe the source
operator have closed caused by some short circuit
- // return eof will make task marked as wake_up_early
- // todo: remove signaled after we can guarantee that wake up eraly is
always set accurately
+ // The non-builder instance waits for the builder (task 0) to finish
building the hash table.
+ // If _signaled is false, either the builder hasn't finished yet, or
the builder was
+ // terminated (woken up early) without building the hash table — in
both cases, return EOF.
+ //
+ // The close() Defer in the builder conditionally sets _signaled=true
ONLY when the builder
+ // was NOT terminated (i.e., the hash table was actually built). When
the builder is
+ // terminated, _signaled stays false, so non-builders always hit this
guard and return EOF
+ // safely — never reaching the std::visit on an uninitialized
(monostate) hash table.
+ //
+ // _terminated is also checked as a secondary guard. Note that
_terminated is always false
+ // inside sink() during execute() (it's set in the Defer AFTER
execute() returns), so it
+ // provides no protection against the race. The _signaled check is the
real guard.
if (!_signaled || local_state._terminated) {
Review Comment:
The guard `if (!_signaled || local_state._terminated)` is now accompanied by
a comment stating `_terminated` is always false inside `sink()` and provides no
protection against the race. That makes the `local_state._terminated` part look
redundant/confusing; consider either removing it or adjusting the comment to
explain when `_terminated` can actually be true at this point.
```suggestion
// At this point, termination is reflected solely through the value
of _signaled: a
// terminated builder never sets _signaled to true. Checking
!_signaled is therefore
// sufficient and serves as the real guard against racing with an
uninitialized hash table.
if (!_signaled) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]