Thanks for the thoughts.

I've got enough clutter around not to need to spend several hundred
dollars for a test card.

I also have no idea how to compile a new kernel, but do know that my
being asked that confirms someone goofed.

Anyway, maybe something else is afoot.  Some noodles, please let me know
what you think:

For starters:  I've noticed that sox fails periodically.  I've got a
couple of scripts that play a brief OGG system sound every half hour,
one on the hour, one five minutes before, to pace my day.  Actually, the
first is set for 3 seconds later than 00, to avoid conflicts.  A .timer
triggers a .service, which runs a one-line shell script that calls play
(a sox command).

This works until it doesn't, whereupon there are error logs, as follows
(others that differ are further down ...).  (What exactly starts
tripping it up?):

Apr 27 19:00:03 greystone systemd[1]: Starting Play sounds | pomo-in.service...
Apr 27 19:00:03 greystone bash[31026]: DIRFILE is 
/srv/greystone-data/users/eric/data/computer/media/sounds/chirps-notes/service-login.oga.
Apr 27 19:00:03 greystone bash[31027]: ALSA lib 
pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
Apr 27 19:00:03 greystone kernel: traps: play[31027] trap divide error 
ip:7ff27953a37a sp:7ffc76cb7fa0 error:0 in libsox_fmt_ao.so[7ff27953a000+1000]
Apr 27 19:00:03 greystone bash[31026]: /home/locsh/pomo-in.sh: line 38: 31027 
Floating point exception(core dumped) play $DIR$FILE
Apr 27 19:00:03 greystone systemd[1]: pomo-in.service: Main process exited, 
code=exited, status=136/n/a
Apr 27 19:00:03 greystone systemd[1]: pomo-in.service: Failed with result 
'exit-code'.

Perhaps these offer some clues.

Separately, xrandr-related message storms crop up (revise xorg.conf?):

Apr 27 19:00:35 greystone gsd-color[5002]: no xrandr-DVI-I-0 device
found: Failed to find output xrandr-DVI-I-0

Another weird one (no such device exists):

Apr 27 19:15:32 greystone dbus-daemon[1033]: [system] Activating via systemd: 
service name='net.reactivated.Fprint' unit='fprintd.service' requested by 
':1.88' (uid=1000 pid=4878 comm="/usr/bin/gnome-shell " label="unconfined")
Apr 27 19:15:33 greystone systemd[1]: Starting Fingerprint Authentication 
Daemon...
Apr 27 19:15:33 greystone dbus-daemon[1033]: [system] Successfully activated 
service 'net.reactivated.Fprint'
Apr 27 19:15:33 greystone systemd[1]: Started Fingerprint Authentication Daemon.

Then (who/what is doing this, and why?):

Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 31 with keysym 31 (keycode a).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 32 with keysym 32 (keycode b).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 33 with keysym 33 (keycode c).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 34 with keysym 34 (keycode d).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 35 with keysym 35 (keycode e).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 38 with keysym 38 (keycode 11).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 39 with keysym 39 (keycode 12).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 36 with keysym 36 (keycode f).
Apr 27 19:15:35 greystone gnome-shell[4878]: Window manager warning: 
Overwriting existing binding of keysym 37 with keysym 37 (keycode 10).

Then, lots of these, followed by iptables reporting a bunch of outbound
UDP/443 drops (who's the second user?):

Apr 27 19:16:16 greystone rtkit-daemon[1645]: Supervising 5 threads of 3
processes of 2 users.

The pulseaudio storms follow this, which I really don't follow (a socket
client?  Doing what?):

Apr 27 19:17:07 greystone pulseaudio[4650]: Created 15 "Native client
(UNIX socket client)"

Then, a storm of samba login attempts for five minutes, God knows how or
from where because this is walled off, e.g.,

Apr 27 17:49:00 greystone smbd[29697]: [2022/04/27 17:49:00.868861,  1, 
pid=29697] ../../lib/param/loadparm.c:1870(lpcfg_do_global_parameter)
Apr 27 17:49:00 greystone smbd[29697]: pam_unix(samba:session): session closed 
for user nobody

Then a sox crash:

Apr 27 18:55:00 greystone systemd[1]: Starting Play sounds | pomo-out.service...
Apr 27 18:55:00 greystone bash[30988]: ALSA lib 
pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
Apr 27 18:55:00 greystone kernel: traps: play[30988] trap divide error 
ip:7f2bd05b737a sp:7ffd64efe500 error:0 in libsox_fmt_ao.so[7f2bd05b7000+1000]
Apr 27 18:55:00 greystone systemd[1]: Starting Process error reports when 
automatic reporting is enabled...
Apr 27 18:55:00 greystone bash[30987]: /home/locsh/pomo-out.sh: line 38: 30988 
Floating point exception(core dumped) play $DIR$FILE
Apr 27 18:55:00 greystone systemd[1]: pomo-out.service: Main process exited, 
code=exited, status=136/n/a
Apr 27 18:55:00 greystone systemd[1]: pomo-out.service: Failed with result 
'exit-code'.
Apr 27 18:55:00 greystone systemd[1]: Failed to start Play sounds | 
pomo-out.service.
Apr 27 18:55:00 greystone whoopsie-upload-all[30991]: 
/var/crash/_usr_bin_sox.0.crash already marked for upload, skipping
Apr 27 18:55:00 greystone whoopsie-upload-all[30991]: 
/var/crash/_usr_lib_vmware_bin_appLoader.0.crash already marked for upload, 
skipping
Apr 27 18:55:00 greystone whoopsie-upload-all[30991]: All reports processed
Apr 27 18:55:00 greystone systemd[1]: apport-autoreport.service: Succeeded.
Apr 27 18:55:00 greystone systemd[1]: Finished Process error reports when 
automatic reporting is enabled.
Apr 27 18:55:39 greystone rsyslogd[1199]: -- MARK --
Apr 27 18:57:22 greystone gnome-shell[4878]: cr_parser_new_from_buf: assertion 
'a_buf && a_len' failed
Apr 27 18:57:22 greystone gnome-shell[4878]: 
cr_declaration_parse_list_from_buf: assertion 'parser' failed

Another sox crash, followed by an xrandr storm:

Apr 27 18:55:00 greystone systemd[1]: Starting Play sounds | pomo-out.service...
Apr 27 18:55:00 greystone bash[30988]: ALSA lib 
pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
Apr 27 18:55:00 greystone kernel: traps: play[30988] trap divide error 
ip:7f2bd05b737a sp:7ffd64efe500 error:0 in libsox_fmt_ao.so[7f2bd05b7000+1000]
Apr 27 18:55:00 greystone systemd[1]: Starting Process error reports when 
automatic reporting is enabled...
Apr 27 18:55:00 greystone bash[30987]: /home/locsh/pomo-out.sh: line 38: 30988 
Floating point exception(core dumped) play $DIR$FILE
Apr 27 18:55:00 greystone systemd[1]: pomo-out.service: Main process exited, 
code=exited, status=136/n/a
Apr 27 18:55:00 greystone systemd[1]: pomo-out.service: Failed with result 
'exit-code'.
Apr 27 18:55:00 greystone systemd[1]: Failed to start Play sounds | 
pomo-out.service.
Apr 27 18:55:00 greystone whoopsie-upload-all[30991]: 
/var/crash/_usr_bin_sox.0.crash already marked for upload, skipping
Apr 27 18:55:00 greystone whoopsie-upload-all[30991]: 
/var/crash/_usr_lib_vmware_bin_appLoader.0.crash already marked for upload, 
skipping
Apr 27 18:55:00 greystone whoopsie-upload-all[30991]: All reports processed
Apr 27 18:55:00 greystone systemd[1]: apport-autoreport.service: Succeeded.
Apr 27 18:55:00 greystone systemd[1]: Finished Process error reports when 
automatic reporting is enabled.
Apr 27 18:55:39 greystone rsyslogd[1199]: -- MARK --
Apr 27 18:57:22 greystone gnome-shell[4878]: cr_parser_new_from_buf: assertion 
'a_buf && a_len' failed
Apr 27 18:57:22 greystone gnome-shell[4878]: 
cr_declaration_parse_list_from_buf: assertion 'parser' failed

This is interesting:  Pulseaudio is sinking some kind of input and then
immediately iptables drops outbound invalid tcp/443 packets:

Apr 27 21:12:27 greystone pulseaudio[4650]: Sink input 89: 
proplist[media.name]: (data) -> (data)
Apr 27 21:15:23 greystone kernel: [ipt-invalid-out DROP] IN= OUT=lan0 
SRC=192.168.1.2 DST=69.147.88.8 LEN=40 TOS=0x00 PREC=0x20 TTL=64 ID=0 DF 
PROTO=TCP SPT=36194 DPT=443 WINDOW=0 RES=0x00 RST URGP=0 
Apr 27 21:15:41 greystone rsyslogd[1199]: -- MARK --
Apr 27 21:15:43 greystone kernel: [ipt-invalid-out DROP] IN= OUT=lan0 
SRC=192.168.1.2 DST=69.147.88.8 LEN=40 TOS=0x00 PREC=0x20 TTL=64 ID=0 DF 
PROTO=TCP SPT=36196 DPT=443 WINDOW=0 RES=0x00 RST URGP=0 

Another sox crash:

Apr 27 21:25:00 greystone kernel: traps: play[46530] trap divide error 
ip:7f44b374637a sp:7ffe381abb20 error:0 in libsox_fmt_ao.so[7f44b3746000+1000]
Apr 27 21:25:00 greystone bash[46530]: ALSA lib 
pcm_dmix.c:1089:(snd_pcm_dmix_open) unable to open slave
Apr 27 21:25:00 greystone bash[46529]: /home/locsh/pomo-out.sh: line 38: 46530 
Floating point exception(core dumped) play $DIR$FILE
Apr 27 21:25:00 greystone systemd[1]: pomo-out.service: Main process exited, 
code=exited, status=136/n/a

Then, why does fwupd run?:

Apr 27 21:26:58 greystone systemd[1]: Starting Refresh fwupd metadata and 
update motd...
Apr 27 21:26:58 greystone systemd[1]: fwupd-refresh.service: Main process 
exited, code=exited, status=1/FAILURE
Apr 27 21:26:58 greystone systemd[1]: fwupd-refresh.service: Failed with result 
'exit-code'.
Apr 27 21:26:58 greystone systemd[1]: Failed to start Refresh fwupd metadata 
and update motd.

And, nvidia-modeset errors (why do these run if nvidia-modeset=0?)
followed by an invalid outbound tcp/443 packet.  Very weird:

Apr 27 21:43:00 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display 
engine timed out: 0x0000917d:0:0:1079
Apr 27 21:43:02 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display 
engine timed out: 0x0000917c:0:0:1095
Apr 27 21:43:04 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display 
engine timed out: 0x0000917c:1:0:1095
Apr 27 21:43:21 greystone kernel: [ipt-invalid-out DROP] IN= OUT=lan0 
SRC=192.168.1.2 DST=204.237.133.116 LEN=40 TOS=0x00 PREC=0x20 TTL=64 ID=0 DF 
PROTO=TCP SPT=38970 DPT=443 WINDOW=0 RES=0x00 RST URGP=0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to nvidia-graphics-drivers-470 in Ubuntu.
https://bugs.launchpad.net/bugs/1970072

Title:
  [nvidia] Frequent GPU Resets (and GUI Freeze) by gsd-media-keys

Status in nvidia-graphics-drivers-470 package in Ubuntu:
  New

Bug description:
  A fresh install of Ubuntu 20.04.4 has GUI freezes frequently (i.e., a
  dozen times daily) at inconsistent intervals and times, and not
  obviously in response to any particular user input.  The system then
  is unresponsive to user input, although the mouse cursor does continue
  to move about.

  The syslog messages reveal the message from gsd-media-keys "[GFX1]:
  Device reset due to WR context"

  The freeze also is preceded by often hundreds of pulseaudio messages
  related to latency issues, notwithstanding that no audio is playing.
  Often, gdm-x-session messages appear referencing the NVIDIA GPU and
  "WAIT".  The pulseaudio message "setting avail_min=87496" often is the
  last message and if not, one of the last, before the reboot.

  Also common before is a gnome-shell message "Ignored exception from
  dbus method: Gio.DBusError:
  GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name
  com.gonzaarcr.appmenu was not provided by any .service files", which
  precedes the pulseaudio messages.

  Happy to provide any other information but would welcome some guidance
  because I'm on day 11 or 12 of this installation and wearing out my
  welcome at Google.

  Ubuntu 20.04.4 LTS

  apt-cache returns:
    Installed: 3.36.9-0ubuntu0.20.04.2
    Candidate: 3.36.9-0ubuntu0.20.04.2
    Version table:
   *** 3.36.9-0ubuntu0.20.04.2 500
          500 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 
Packages
          100 /var/lib/dpkg/status
       3.36.4-1ubuntu1~20.04.2 500
          500 http://security.ubuntu.com/ubuntu focal-security/main amd64 
Packages
       3.36.1-5ubuntu1 500
          500 http://us.archive.ubuntu.com/ubuntu focal/main amd64 Packages

  Expected behavior:  Stability, especially from the fourth release of a
  LTS.

  Behavior:  Persistent GUI freezes

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: gnome-shell 3.36.9-0ubuntu0.20.04.2
  ProcVersionSignature: Ubuntu 5.13.0-40.45~20.04.1-generic 5.13.19
  Uname: Linux 5.13.0-40-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu27.23
  Architecture: amd64
  CasperMD5CheckResult: skip
  Date: Sat Apr 23 16:26:45 2022
  DisplayManager: gdm3
  GsettingsChanges:
   
  InstallationDate: Installed on 2022-04-09 (14 days ago)
  InstallationMedia: Ubuntu 20.04.3 LTS "Focal Fossa" - Release amd64 (20210819)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  RelatedPackageVersions: mutter-common 3.36.9-0ubuntu0.20.04.2
  SourcePackage: gnome-shell
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-470/+bug/1970072/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to