Just a personal anecdote that might be worth something. On both my AMD chipsets motherboards ( x570/x670E Proart Wifi ) ; I was getting microstutters and odd hangs occasionally for the last year or so, reboots would often power off rather than power cycle - which I mostly wrote off as odditiy with the Mobo . I had a PSU blow (less than 2 years in) on that build - which I put down to Winter Peak power being hot in NZ ( I measure 247V off the grid through the UPS).
It was a beQuiet 12 Pro 1000W - RMA'd and replaced with a 1300W beQuiet Pro ; Which went BANG ! after two days - after isolating circuit/removing it from the UPS I went through another 2 beQuiet Pro 1300W within a week with same Bang! (Fet exploding) after a couple of days of working. 4th one switched to a Corsair and it's been fine since. Turns out there is some issue with that particular Power Supply Brand and compatibility with AMD Chipsets - which is not a thing I was expecting to find. -Joel On Wed, 19 Jul 2023 at 09:27, Kastus Shchuka <open...@tprfct.net> wrote: > On Tue, Jul 18, 2023 at 08:09:11PM +0100, cho...@jtan.com wrote: > > Not really. But. > > > > I have an APU2 which runs two VMs that do practically nothing, > > although the box itself is used actively. The VMs consistently, and > > without warning, hang in a way which matches the description "nothing > > new can be execed" although I recall being able to log in on the > > console. I noticed shortly after I installed the VMs in around May > > but I haven't got very far diagnosing it because it's a low priority. > > However there is a common denominator: AMD > > > > cpu0 at mainbus0: apid 0 (boot processor) > > cpu0: AMD G-T40E Processor, 1000.02 MHz, 14-02-00 > > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC > > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache > > cpu0: 512KB 64b/line 16-way L2 cache > > cpu0: smt 0, core 0, package 0 > > > > Times two. > > > > As you say the existing processes seem to work fine right up until > > sshd is nearly (but not quite?) ready to fork: > > > > . > > . > > . > > debug1: SSH2_MSG_EXT_INFO received > > debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519, > sk-ssh-ed25...@openssh.com > ,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521, > sk-ecdsa-sha2-nistp...@openssh.com, > webauthn-sk-ecdsa-sha2-nistp...@openssh.com > ,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512> > > debug1: kex_input_ext_info: publickey-hostbo...@openssh.com=<0> > > debug1: SSH2_MSG_SERVICE_ACCEPT received > > > > Ordinarily it would next attempt authentication. Does sshd fork and > > drop privileges to do that? > > > > I don't know if that could help or even if it's related, but it can > > be reproduced with confidence. I can poke the box or its VMs any > > way that could shake some data loose. > > > > Matthew > > > > Is AMD errata referenced from https://inks.tedunangst.com/l/4996 any > relevant? > (errata #1474 in > https://www.amd.com/system/files/TechDocs/56323-PUB_1.01.pdf) > > -Kastus > >