chenBright commented on code in PR #3009:
URL: https://github.com/apache/brpc/pull/3009#discussion_r2173632622


##########
src/bthread/task_control.cpp:
##########
@@ -524,12 +525,29 @@ void TaskControl::print_rq_sizes(std::ostream& os) {
 
 double TaskControl::get_cumulated_worker_time() {
     int64_t cputime_ns = 0;
+    int64_t now = butil::cpuwide_time_ns();
     BAIDU_SCOPED_LOCK(_modify_group_mutex);
     for_each_task_group([&](TaskGroup* g) {
         if (g) {
-            cputime_ns += g->_cumulated_cputime_ns;
+            // With the acquire-release atomic operation, the CPU time of the 
bthread is
+            // only calculated once through `_cumulated_cputime_ns' or 
`_last_run_ns'.
+            cputime_ns += 
g->_cumulated_cputime_ns.load(butil::memory_order_acquire);
+            // The bthread is still running on the worker,
+            // so we need to add the elapsed time since it started.
+            // In extreme cases, before getting `_last_run_ns_in_tc',
+            // `_last_run_ns_in_tc' may have been updated multiple times,
+            // and `cputime_ns' will miss some cpu time, which is ok.
+            int64_t last_run_ns = g->_last_run_ns;
+            if (last_run_ns > _last_get_cumulated_time_ns) {
+                g->_last_run_ns_in_tc = last_run_ns;
+                cputime_ns += now - last_run_ns;
+            } else if (last_run_ns == g->_last_run_ns_in_tc) {
+                // The bthread is still running on the same worker.
+                cputime_ns += now - last_run_ns;
+            }

Review Comment:
   https://rigtorp.se/isatomic/
   > On the latest CPU microarchitectures (Skylake and Zen 2) AVX/AVX2 
128b/256b aligned loads and stores are atomic even though Intel and AMD 
officially doesn’t guarantee this. 
   
   Perhaps we can use the implicit feature of modern CPUs that 128-bit aligned 
load/store operations are atomic to implement atomic operations on combination 
of `_cumulated_cputime_ns` and `_last_run_ns`.
   
   x86_64: `_mm_load_si128`, `_mm_store_si128`
   Arm and later: `vld1q_s64`
   
   For scenarios that do not support 128-bit atomic operations, use the lock to 
guarantee this. Only the worker thread and sample thread acquire the lock, and 
the sample thread acquires locks once a second. There is no lock contention 
between worker threads. So lock contention can be ignored.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@brpc.apache.org
For additional commands, e-mail: dev-h...@brpc.apache.org

Reply via email to