This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new eb61c26d471 [opt](scanner) set number of file scanner to 
max_scanners_concurrency (#59622)
eb61c26d471 is described below

commit eb61c26d471169c4b4552a3dddd1737620abde2a
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Fri Jan 9 13:07:40 2026 +0800

    [opt](scanner) set number of file scanner to max_scanners_concurrency 
(#59622)
    
    ### What problem does this PR solve?
    
    Problem Summary:
    
    For external tables, each scanner is not bound to a specific split.
    Instead, when a scanner is scheduled,
    it dynamically fetches the next scan range from a unified split source
    for scanning.
    Therefore, the number of scanners only needs to match
    max_scanners_concurrency to ensure full-speed execution.
    
    It also fix a profile issue.
    Before:
    ```
    PerScannerRunningTime: [7.341us, 15.372us, 5.987us, 9.738us, 10.630us, 
21.631us, 7.539us, 7.586us, 6.247us, 12.755us, 10.989us, 12.221us, 18.952us, 
3.450us, 7.805us, 12.291us, 1s282ms, 1s242ms, 1s263ms, 1s363ms, 1s228ms, 
1s283ms, 1s267ms, 1s273ms, 1s177ms, 1s271ms, 1s197ms, 1s351ms, 1s357ms, 
1s460ms, 1s253ms, 4s469ms, ]
    ```
    The time is not even
    
    After:
    ```
    - PerScannerRunningTime: [287.588ms, 324.711ms, 282.664ms, 299.930ms, 
269.238ms, 321.864ms, 314.268ms, 309.313ms, 315.368ms, 332.571ms, 290.192ms, 
278.908ms, 335.692ms, 275.525ms, 322.447ms, 346.342ms, ]
    ```
---
 be/src/pipeline/exec/file_scan_operator.cpp | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/be/src/pipeline/exec/file_scan_operator.cpp 
b/be/src/pipeline/exec/file_scan_operator.cpp
index ddb29ccdf51..2ffa0e64465 100644
--- a/be/src/pipeline/exec/file_scan_operator.cpp
+++ b/be/src/pipeline/exec/file_scan_operator.cpp
@@ -134,9 +134,13 @@ void FileScanLocalState::set_scan_ranges(RuntimeState* 
state,
     auto calc_max_scanners = [&](int parallel_instance_num) -> int {
         int max_scanners = 
vectorized::ScannerScheduler::default_remote_scan_thread_num() /
                            parallel_instance_num;
-        if (should_run_serial()) {
-            max_scanners = 1;
-        }
+        // For external tables, each scanner is not bound to specific splits.
+        // Instead, when a scanner is scheduled, it dynamically fetches the 
next scan range
+        // from a unified split source for scanning.
+        // Therefore, the number of scanners only needs to match 
"max_scanners_concurrency"
+        // to ensure full-speed execution.
+        // For 32 core node, the default "max_scanners_concurrency" should be 
16
+        max_scanners = std::min(max_scanners, max_scanners_concurrency(state));
         return max_scanners;
     };
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to