hjwsm1989 opened a new issue, #3277:
URL: https://github.com/apache/brpc/issues/3277
### Describe the bug
Since commit 12fb539a ("Use monotonic time instead of wall time" / #3268,
merged around 1.17.0-rc1), the bvar sampler thread no longer sleeps between
sampling rounds and prints
sampler.cpp:192 bvar is busy at sampling for 2 seconds!
continuously (tens of thousands of log lines per minute on a process with a
moderate number of bvars).
Root cause is a unit-conversion mistake in
`src/bvar/detail/sampler.cpp::SamplerCollector::run()`:
```cpp
while (!_stop) {
int64_t abstime = butil::cpuwide_time_ns(); // ns (was
gettimeofday_us() / us)
...
int64_t now = butil::cpuwide_time_ns(); // ns
_cumulated_time_us += now - abstime;
abstime += 1000000L; // BUG: + 1 ms, not + 1 s
while (abstime > now) {
::usleep(abstime - now); // BUG: usleep() takes us,
value is ns
slept = true;
now = butil::cpuwide_time_ns();
}
if (slept) { consecutive_nosleep = 0; }
else if (++consecutive_nosleep >= WARN_NOSLEEP_THRESHOLD) { LOG(WARNING)
... }
}
```
PR #3268 switched the three time-source calls from `gettimeofday_us()`
(microseconds) to `cpuwide_time_ns()` (nanoseconds), but left `abstime +=
1000000L` and `::usleep(abstime - now)` untouched. The intended "sample once
per second" becomes "advance abstime by 1 ms"; the inner `while` loop almost
never enters (so `slept` stays `false`), and after two consecutive nosleep
iterations the warning fires. In practice the sampler runs at ~1 kHz instead of
1 Hz, burning CPU and flooding logs.
### To Reproduce
1. Build any service linking brpc >= 1.17.0-rc1 (commit 12fb539a or newer).
2. Register any bvar (e.g. `bvar::LatencyRecorder`).
3. Start the process and tail its log:
```bash
grep "bvar is busy at sampling" <logfile> | wc -l
```
Observed: tens of thousands of warnings per minute.
### Expected behavior
The sampler runs once per second; no "busy" warning unless the host is truly
overloaded (the prior <=1.16.x behavior).
### Versions
- brpc: 1.17.0-rc1 / current master (commit 12fb539a and descendants)
- OS: Linux 5.14 (x86_64)
- Compiler: gcc 12, C++20
### Fix
Follow-up PR to be linked below.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]