On Sat, Feb 29, 2020 at 07:41:59AM -0800, Justin Noor wrote:
> Awesome - thank you for your time and for the valuable information.
> 
> That’s hilarious about the serial port. I’ll try plugging into a switch,
> reproducing the crash, and SSHing into it. I still haven’t tried the
> syslogd tip you mentioned either. It’s time for me to start learning more
> about X. Will be in touch.
> 
> Regards
> 
> On Fri, Feb 28, 2020 at 6:57 AM Stuart Longland <stua...@longlandclan.id.au>
> wrote:
> 
> > On 28/2/20 11:32 pm, Justin Noor wrote:
> > > Thanks for offering to help and sorry for the delay - I got dragged into
> > a
> > > work emergency. I finally managed to SCP my dmesg to a remote machine.
> >
> > Heh, no problems, these things happen.
> >
> > > As a refresher I have a 6.6 current machine that crashes when X is
> > running,
> > > and almost instantly when Firefox is running - it runs fine without X.
> > The
> > > machine becomes totally frozen - I have to perform a forced shutdown to
> > > exit this state. The issue appears to be graphics related and is
> > > inconsistent - sometimes it crashes immediately, other times it does not.
> >
> > Sometimes it might be the way a particular graphics toolkit "tickles"
> > the video hardware too.  For instance FVWM uses libxcb for drawing
> > graphics which means you're likely to be just working with 2D primitives.
> >
> > Then Firefox with its GTK+ back-end fires off a few RENDER extension
> > requests to the X server and whoopsie!  Down she goes!
> >
> > > There are indeed some "unknown product" messages related to my PCI
> > graphics
> > > card in my dmesg, but I haven't been able to decipher them yet. Those
> > > usually mean the device is not supported, but it is, and I'm sure I have
> > > the correct driver (amdgpu0). Previously I had no issues for months,
> > which
> > > is why I suspected hardware failure. Admittedly I've been lucky with
> > > graphics cards over the years, and don't know much about PCI.
> >
> > No issues for months running a previous version of OpenBSD or the same
> > you're running now?
> >
> > One suggestion I made too was to maybe try setting up a serial console
> > link… turns out the motherboard makers know how to tease:
> >
> > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > > com0: probed fifo depth: 0 bytes
> >
> > That says there is a RS-232 port somewhere… so I had a look at the
> > handbook:
> >
> > https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_STRIX_B450-I_GAMING/E14337_ROG_STRIX_B450-I_GAMING_UM_PRINT.pdf
> >
> > They didn't wire it up to a pin header, which is annoying.
> >
> > On the video front, I did see this:
> > > initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1002:0x0B04
> > > 0xE5).
> > > amdgpu_irq_add_domain: stub
> > > amdgpu_device_resize_fb_bar: stub
> > > amdgpu: [powerplay] Failed to retrieve minimum clocks.
> > > amdgpu0: 1360x768, 32bpp
> > > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0
> > > wskbd1: connecting to wsdisplay0
> > > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> >
> > The "stub" messages make me wonder if we're hitting some
> > not-yet-implemented features.  That "failed to retrieve minimum clocks"
> > has been seen on Linux as well, and there it was related to PCI prefetch
> > register programming.
> >
> > The machine you've got isn't much different to what I have at work
> > actually: Rysen 7 1700 (so previous generation), and a RX550 video card
> > (POLARIS12, maybe slightly newer?)… the machine is fitted with a RS-232
> > serial port so I might try a little experiment with a USB stick and see
> > if I can install OpenBSD 6.6 to USB storage and try to reproduce the crash.
> > --
> > Stuart Longland (aka Redhatter, VK4MSL)
> >
> > I haven't lost my mind...
> >   ...it's backed up on a tape somewhere.
> >

Hello Justin and Stuart,

It is possible that the errors that I have found in /var/log/messages*
are unrelated to the above. Thoughts?

I have noticed that the freezes on this machine occur more quickly if I
am working within tmux(1), as I was; at the time that the last freeze
occurred. That may have been sheer coincidence.

$ grep ERROR /var/log/messag*
/var/log/messages:Mar  8 16:20:10 gx470 /bsd: [drm] *ERROR* ring gfx timeout, 
signaled seq=385, emitted seq=387
/var/log/messages:Mar  9 07:06:34 gx470 /bsd: [drm] *ERROR* Illegal register 
access in command stream
/var/log/messages:Mar  9 07:06:44 gx470 /bsd: [drm] *ERROR* ring gfx timeout, 
signaled seq=794, emitted seq=796

My machine's last freeze occurred at the time of the last error in
/var/log/messages. I am able to remotely login to this machine and
access files when it is frozen, using kermit(1) and a USB to Serial
adapter. The machine's /var/run/dmesg.boot can be found in my first
email to this thread.

Regards Avon

-- 
aer

Reply via email to