hjwsm1989 opened a new issue, #3277:
URL: https://github.com/apache/brpc/issues/3277

   ### Describe the bug
   
   Since commit 12fb539a ("Use monotonic time instead of wall time" / #3268, 
merged around 1.17.0-rc1), the bvar sampler thread no longer sleeps between 
sampling rounds and prints
   
       sampler.cpp:192 bvar is busy at sampling for 2 seconds!
   
   continuously (tens of thousands of log lines per minute on a process with a 
moderate number of bvars).
   
   Root cause is a unit-conversion mistake in 
`src/bvar/detail/sampler.cpp::SamplerCollector::run()`:
   
   ```cpp
   while (!_stop) {
       int64_t abstime = butil::cpuwide_time_ns();   // ns  (was 
gettimeofday_us() / us)
       ...
       int64_t now = butil::cpuwide_time_ns();       // ns
       _cumulated_time_us += now - abstime;
       abstime += 1000000L;                          // BUG: + 1 ms, not + 1 s
       while (abstime > now) {
           ::usleep(abstime - now);                  // BUG: usleep() takes us, 
value is ns
           slept = true;
           now = butil::cpuwide_time_ns();
       }
       if (slept) { consecutive_nosleep = 0; }
       else if (++consecutive_nosleep >= WARN_NOSLEEP_THRESHOLD) { LOG(WARNING) 
... }
   }
   ```
   
   PR #3268 switched the three time-source calls from `gettimeofday_us()` 
(microseconds) to `cpuwide_time_ns()` (nanoseconds), but left `abstime += 
1000000L` and `::usleep(abstime - now)` untouched. The intended "sample once 
per second" becomes "advance abstime by 1 ms"; the inner `while` loop almost 
never enters (so `slept` stays `false`), and after two consecutive nosleep 
iterations the warning fires. In practice the sampler runs at ~1 kHz instead of 
1 Hz, burning CPU and flooding logs.
   
   ### To Reproduce
   
   1. Build any service linking brpc >= 1.17.0-rc1 (commit 12fb539a or newer).
   2. Register any bvar (e.g. `bvar::LatencyRecorder`).
   3. Start the process and tail its log:
   
      ```bash
      grep "bvar is busy at sampling" <logfile> | wc -l
      ```
   
      Observed: tens of thousands of warnings per minute.
   
   ### Expected behavior
   
   The sampler runs once per second; no "busy" warning unless the host is truly 
overloaded (the prior <=1.16.x behavior).
   
   ### Versions
   
   - brpc: 1.17.0-rc1 / current master (commit 12fb539a and descendants)
   - OS: Linux 5.14 (x86_64)
   - Compiler: gcc 12, C++20
   
   ### Fix
   
   Follow-up PR to be linked below.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to