After googling, I tried a few things:

Memory has correct timing, frequencies and voltage (no improvement)

kernel parameters => no improvement
- idle=nomwait
- processor.max_cstate=5
- rcu_nocbs=0-11

Undervolting / Overclocking => seems to make the system a bit more
stable
- Reducing PPT to 45W 
- PBS Curve all cores: -10
- Boost limit: -300 (ending around 4Ghz)

Deactivate SMT => no improvement

Deactivate selective CPUs (Error always showed on CPU5) => no
improvement

Deactivating tx, sg, tso offloading => no improvement

Overall it seems the system crashes when doing load changes, e.g. like
compiling. It then takes SATA, network, etc. down, leading to an
unusable system.

Reply via email to