This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new eb61c26d471 [opt](scanner) set number of file scanner to
max_scanners_concurrency (#59622)
eb61c26d471 is described below
commit eb61c26d471169c4b4552a3dddd1737620abde2a
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Fri Jan 9 13:07:40 2026 +0800
[opt](scanner) set number of file scanner to max_scanners_concurrency
(#59622)
### What problem does this PR solve?
Problem Summary:
For external tables, each scanner is not bound to a specific split.
Instead, when a scanner is scheduled,
it dynamically fetches the next scan range from a unified split source
for scanning.
Therefore, the number of scanners only needs to match
max_scanners_concurrency to ensure full-speed execution.
It also fix a profile issue.
Before:
```
PerScannerRunningTime: [7.341us, 15.372us, 5.987us, 9.738us, 10.630us,
21.631us, 7.539us, 7.586us, 6.247us, 12.755us, 10.989us, 12.221us, 18.952us,
3.450us, 7.805us, 12.291us, 1s282ms, 1s242ms, 1s263ms, 1s363ms, 1s228ms,
1s283ms, 1s267ms, 1s273ms, 1s177ms, 1s271ms, 1s197ms, 1s351ms, 1s357ms,
1s460ms, 1s253ms, 4s469ms, ]
```
The time is not even
After:
```
- PerScannerRunningTime: [287.588ms, 324.711ms, 282.664ms, 299.930ms,
269.238ms, 321.864ms, 314.268ms, 309.313ms, 315.368ms, 332.571ms, 290.192ms,
278.908ms, 335.692ms, 275.525ms, 322.447ms, 346.342ms, ]
```
---
be/src/pipeline/exec/file_scan_operator.cpp | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/be/src/pipeline/exec/file_scan_operator.cpp
b/be/src/pipeline/exec/file_scan_operator.cpp
index ddb29ccdf51..2ffa0e64465 100644
--- a/be/src/pipeline/exec/file_scan_operator.cpp
+++ b/be/src/pipeline/exec/file_scan_operator.cpp
@@ -134,9 +134,13 @@ void FileScanLocalState::set_scan_ranges(RuntimeState*
state,
auto calc_max_scanners = [&](int parallel_instance_num) -> int {
int max_scanners =
vectorized::ScannerScheduler::default_remote_scan_thread_num() /
parallel_instance_num;
- if (should_run_serial()) {
- max_scanners = 1;
- }
+ // For external tables, each scanner is not bound to specific splits.
+ // Instead, when a scanner is scheduled, it dynamically fetches the
next scan range
+ // from a unified split source for scanning.
+ // Therefore, the number of scanners only needs to match
"max_scanners_concurrency"
+ // to ensure full-speed execution.
+ // For 32 core node, the default "max_scanners_concurrency" should be
16
+ max_scanners = std::min(max_scanners, max_scanners_concurrency(state));
return max_scanners;
};
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]