Mryange opened a new pull request, #65140:
URL: https://github.com/apache/doris/pull/65140
### What problem does this PR solve?
We are investigating an occasional query timeout where a pipeline task
stayed runnable for almost 900s around a local exchange, cross join, and sort
sink pipeline. The current evidence suggests the task is not blocked waiting
for LocalExchange input: all local exchange sink operators have finished, the
source queue still has data, and all dependencies of the runnable task are
ready. We suspect the cross join may be stuck or making no progress when it
receives an oversized block from an adaptive passthrough local exchange.
Relevant debug info from the timeout log:
```text
PipelineFragmentContext Info: _closed_tasks=22, _total_tasks=24,
need_notify_close=false, fragment_id=10, _rec_cte_stage=0
Task 0: QueryId: 160c012c0d7e49b0-bb5252d26339a281
InstanceId: 160c012c0d7e49b0-bb5252d26339a2af
PipelineTask[id = 0, open = true, eos = false, state = BLOCKED, dry run =
false, _wake_up_early = false, _wake_up_by = -1, time elapsed since last state
changing = 899s, spilling = false, is running = false] elapse time = 899s,
block dependency = [LOCAL_MERGE_SORT_SOURCE_OPERATOR_DEPENDENCY: id=16, block
task = 1, ready=false, _always_ready=false]
LOCAL_MERGE_SORT_SOURCE_OPERATOR: id=16, parallel_tasks=3,
_is_serial_operator=false
Task 1: QueryId: 160c012c0d7e49b0-bb5252d26339a281
InstanceId: 160c012c0d7e49b0-bb5252d26339a2af
PipelineTask[id = 1, open = true, eos = false, state = RUNNABLE, dry run =
false, _wake_up_early = false, _wake_up_by = -1, time elapsed since last state
changing = 895s, spilling = false, is running = true] elapse time = 899s, block
dependency = [NULL]
LOCAL_EXCHANGE_OPERATOR(ADAPTIVE_PASSTHROUGH): id=20, parallel_tasks=3,
_is_serial_operator=false, _channel_id: 0, _num_partitions: 3, _num_senders: 3,
_num_sources: 3, _running_sink_operators: 0, _running_source_operators: 1,
mem_usage: 1205248, data queue info: Data Queue 0: [size approx = 7, eos =
false], MemTrackers: 0: 1205248, 1: 1203200, 2: 1205248,
CROSS_JOIN_OPERATOR: id=15, parallel_tasks=3, _is_serial_operator=false
SORT_SINK_OPERATOR: id=16, _is_serial_operator=false
Read Dependency Information:
0. LOCAL_EXCHANGE_OPERATOR_DEPENDENCY: id=-3, block task = 0, ready=true,
_always_ready=true
1. CROSS_JOIN_OPERATOR_DEPENDENCY: id=15, block task = 0, ready=true,
_always_ready=false
3. MemorySufficientDependency: id=-1, block task = 0, ready=true,
_always_ready=true
Write Dependency Information:
3. SORT_SINK_OPERATOR_DEPENDENCY: id=16, block task = 0, ready=true,
_always_ready=false
```
This change adds defensive diagnostics to the common operator output and
sink input paths to assert that block rows do not exceed `batch_size`. It also
extends nested loop join probe debug output with the current child block rows,
join block rows, probe/build cursor positions, build block row counts, and
whether the operator is using the build-base generation path. These details
should make the next timeout dump show whether the cross join is holding an
oversized probe/build block and whether it is in a no-progress state.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]