I have a program, normal memory usage is <50MB and CPU ~5%. This doesn't change 
over time.

Rebuilding with `-race` shows memory <100MB and CPU ~25%. 
(Consistent with overhead described here: 
https://go.dev/doc/articles/race_detector#Runtime_Overheads)

However, with `-race` enabled after a couple of minutes, the CPU suddenly jumps 
to 100%, and skyrockets to multiple GBs within seconds.

e.g.:

T0: CPU:25%, MEM:95MB
T1: CPU:25%, MEM:95MB
(...)
T100: CPU:25%, MEM:95MB
T101: CPU:99%, MEM:500MB
T102: CPU:99%, MEM:2GB
T103: CPU:99%, MEM:4GB
T104: CPU:99%, MEM:6GB
T105: CPU:99%, MEM:8GB
 => OOM

The CPU jump is drastic and instantaneous, and the memory seems to grow as fast 
as it can be allocated.

The race detector docs says: 

> The race detector currently allocates an extra 8 bytes per defer and recover 
> statement. Those extra allocations are not recovered until the goroutine 
> exits. This means that if you have a long-running goroutine that is 
> periodically issuing defer and recover calls, the program memory usage may 
> grow without bound. These memory allocations will not show up in the output 
> of runtime.ReadMemStats or runtime/pprof.

Here's my question. 

I would like to:
1. Confirm the extra memory is due to the race detector overhead related to 
defer/recover (as opposed to some other bug in the program that only surfaces 
when building with `-race`
2. Find the coroutine(s?) responsible for that defer/recover

Any idea on how to investigate?

I have tried capturing with pprof. Even if the data race allocation are not 
visible ("These memory allocations will not show up in the output of 
runtime.ReadMemStats or runtime/pprof."), I could at least confirm it's not the 
program code allocating.

However pprof does not work for a different reason: once the program is in 
"100%CPU" mode, pprof times out. 
So I can't ever capture a trace/heap/profile while the system is showing the 
behavior (because CPU and memory are already too pegged to handle a pprof dump)

Anything else I could try to get to the bottom of this?

For example, is there a way to trace all defer/recover calls?
Or is there a way to attach a debugger and pause when memory usage exceeds a 
certain amount?

I searched for these and more, but couldn't find much. Maybe some wizard on 
this list has some ideas or pointers.

[go1.19.7.linux-amd64]

Thank you,
M.


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/D974FDFF-1C41-44C4-9573-8CE69B8C76A7%40gmail.com.

Reply via email to