zhiqiang-hhhh opened a new pull request, #56796:
URL: https://github.com/apache/doris/pull/56796

   This pull request introduces a new mechanism for managing OpenMP thread 
usage during concurrent FAISS vector index builds in Doris, improving resource 
control and stability. It adds a global thread budget guard to ensure that the 
total number of threads used does not exceed a configurable limit, and provides 
metrics for monitoring thread usage. Additionally, thread naming is temporarily 
set for easier debugging during index build phases.
   
   **Resource Management Improvements:**
   
   * Added a global thread budget guard (`ScopedOmpThreadBudget`) to limit 
total OpenMP threads used by concurrent FAISS index builds, ensuring the number 
never exceeds the configured `omp_threads_limit`. This replaces previous 
per-thread static allocation and allows dynamic adjustment based on available 
resources. (`be/src/olap/rowset/segment_v2/ann_index/faiss_ann_index.cpp`) 
[[1]](diffhunk://#diff-db0f703076eba6a2e532676979b1bfb928a2ffda3123593779d49b10b143d533R49-R123)
 
[[2]](diffhunk://#diff-db0f703076eba6a2e532676979b1bfb928a2ffda3123593779d49b10b143d533L157-R233)
 
[[3]](diffhunk://#diff-db0f703076eba6a2e532676979b1bfb928a2ffda3123593779d49b10b143d533L172-R258)
   
   * Updated the configuration for `omp_threads_limit` to allow automatic 
adjustment based on available CPU cores (defaulting to 80% of cores if set to 
-1), with validation logic to enforce sensible limits. 
(`be/src/common/config.cpp`, `be/src/common/config.h`) 
[[1]](diffhunk://#diff-b626e6ab16bc72abf40db76bf5094fcc8ca3c37534c2eb83b63b7805e1b601ffL1584-R1597)
 
[[2]](diffhunk://#diff-46e8c1ada0d43acf8c2965e46e90909089aada1f46531976c10605b837f8da3dL1641-R1642)
   
   **Monitoring and Debugging Enhancements:**
   
   * Added a new metric (`ann_index_build_index_threads`) to monitor the number 
of threads reserved for index builds, and registered it in the metrics system. 
(`be/src/util/doris_metrics.cpp`, `be/src/util/doris_metrics.h`) 
[[1]](diffhunk://#diff-878c2670a099f34646a2d514e473068af8a5c43784e0450555a2e2b79e8f27caR253)
 
[[2]](diffhunk://#diff-878c2670a099f34646a2d514e473068af8a5c43784e0450555a2e2b79e8f27caR422)
 
[[3]](diffhunk://#diff-86758ef39527aa2e06d142689e6b13e5a89872ca7aaffb398d1e491749efbd52R261)
   
   * Introduced a scoped thread renaming utility (`ScopedThreadName`) for FAISS 
build phases, making them easier to identify in debuggers and logs. 
(`be/src/olap/rowset/segment_v2/ann_index/faiss_ann_index.cpp`) 
[[1]](diffhunk://#diff-db0f703076eba6a2e532676979b1bfb928a2ffda3123593779d49b10b143d533R49-R123)
 
[[2]](diffhunk://#diff-db0f703076eba6a2e532676979b1bfb928a2ffda3123593779d49b10b143d533L157-R233)
 
[[3]](diffhunk://#diff-db0f703076eba6a2e532676979b1bfb928a2ffda3123593779d49b10b143d533L172-R258)
   
   **Codebase Maintenance:**
   
   * Added necessary includes for threading, synchronization, and algorithms to 
support new resource management logic. 
(`be/src/olap/rowset/segment_v2/ann_index/faiss_ann_index.cpp`)
   
   An illustration of thread name change:
   In previous:
   
![img_v3_02qu_6d47b18b-f7a2-4ad5-a4ad-ff60a849272g](https://github.com/user-attachments/assets/f90ef70e-13ae-49d0-a590-3b1ceeffcd27)
   Now we can see faiss_build/train_idx
   
![img_v3_02qu_1f46db78-bdfd-4e59-a174-7d91fcb8547g](https://github.com/user-attachments/assets/d7e5de95-a1f8-40af-b64d-2dc246869e54)
   
   
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to