I am running Kubuntu 18.10 w kernel 4.18.0-11-generic with AMD Ryzen
2700x CPU, I initially believed I had a Ryzen soft lockup issue, and I
had posted in AMD community forums:


https://community.amd.com/thread/225795#

But I later realized the AMD soft lockup issue is one that required
motherboard reset button to get out off. My issue is usually not so bad,
most of the time, SSH and network and VIRTUAL MACHINES inside my server
will still work. I could use the following command vis SSH to get back
alive:

#sudo systemctl restart sddm

I am now inclined to suspect a Linux Kernel scheduler had caused some of
my threads frozen, and X.org console frozen - mouse and keyboard stuck.

The latest discover on/right-after X'mas 2018 was that all CPUs logical
& physical cores will still be running as seen in ksysguard graphs and
top command, while some threads typically my late night crontab backup
jobs, HANG FOR HOURS randomly and after hours, RESUME THEMSELVES. The
backup was apparently all done - but up to after 12hours of delays!

I had also seen frozen X.org screen later refreshed a little after
45mins, but I could not wait further so I SSH a sddm restart as
mentioned above.


I copy my post dated Dec.27.2018 on AMD community forum below:

Dear All,

Today my new discovery indicated that we may be heading wrong direction
with regards to CPU core voltage and power states. It has got to be
something else.


265px-Ksysguard1.png

I use the famous linux top command and ksysguard (above imgs) and I sort
of AMBUSH the problem awaited to solidly catch a process that frozen.


And my chance came today. I caught my Virtual Machines Backup crontab
jobs frozen at the vmware's vmrun suspend command. Info:

https://docs.vmware.com/en/VMware-Fusion/11/com.vmware.fusion.using.doc
/GUID-24F54E24-EFB0-4E94-8A07-2AD791F0E497.html

My cron jobs put each virtual machines into suspend mode and backup into
a harddisk. I got a clue few days ago when I check through my backups,
their folder date time stamps suggested that the usual backup jobs which
should all be done within 30 mins normally, had on 2 occasions took
several hours! There was nothing else wrong beside the long time spent
at late night to backup, the data seem quite completely backed up. That
means, the lockup or freeze could unfreeze themselves and proceeded to a
long delayed completion.


So I ssh into this Ryzen machine at my crontab job hour today, forwarded
X and ran ksysguard and top at remote desktop. Yes the cron job frozen
and backup was not happening. I also used the linux ps -aux | grep
crontab & similar commands, it was confirm that crontab was hanging
awaiting for vmrun to suspend the vm, and this command just frozen. It
fronzen for almost 2 hours! & later it completed it after this long
delay. And my script went ahead further to backup another virtual
machine, and after backing up, it is suppose to do vmrun resume but
agian, the resume frozen up and took more than 1 hour. After this even
my ssh -X session died. I can not reconnect again.


During these hours, I had the top command and ksysguard showing me that
other processes and thread were running, ALL my 16 logical (8 physical)
CPUs were RUNNING! None of the CPU cores were frozen up in C6 or any
other power states, while the thread hang for hours. Because of
Hyperthreading, each 2 logical CPUs are from 1 single physical CPU core,
and if any core locked up in power state during these hours of lockup,
the graphs of 2 logical CPUs must die for each physical CPU to freeze in
deep sleep state. If 2 physical cores locked up, than graphs of 2
logical CPUs must die (ZERO % usage).


I am very sure of my observations. It was repeated twice during my
AMBUSH mission today. I am very sure of how my scripts work, and how
vmrun works, this similar setup and script had worked for more than 10
years, and used on older AMD and Intel machines. This Ryzen is a recent
replacement for the retired old server.


I am now not inclined to believe that CPU cores were frozen in deep
sleep power states, nor it was Typical Current Idle issue. Not for my
Ryzen machine anyway. It has to be something else, RANDOMLY LOCKING UP,
and RANDOMLY UNLOCKED THEMSELVES, Affecting process / thread that also
appear to be random. I checked the PIDs of these locked up jobs, top
said they were in idle state.


While it was locked I went into various /proc folders and files to sniff
for clues, did not get anything too useful except to see that they were
idle

    /proc/[PID]/status

    /proc/[PID]/task/[PID]/status

My favorite soft reset systemctl restart sddm had worked many times
nearly without fail because I think it flushed out and killed the
hanging threads, this command killed X and everything else running on X,
which will be quite a big number, and it restarted KDE desktop manager.


I am hoping to get a further breakthrough to find out what caused the
thread to LOCK-UP & UNLOCK themselves.


Cheers.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1798961

Title:
  Random unrecoverable freezes on Ubuntu 18.10

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged
Status in linux source package in Cosmic:
  Triaged
Status in linux source package in Disco:
  Triaged

Bug description:
  First thing I notice is that the mouse cursor freezes as I'm using it,
  then I hit the CAPS LOCK key and the LED indicator doesn't respond.
  Then I try the "REISUB" command, but it doesn't do anything either.
  Only a hard reset works, pressing down the power button for a few
  seconds.

  How to reproduce?
  I couldn't figure out a consistent method. It is still random to me.

  Version: Ubuntu 4.18.0-10.11-generic 4.18.12
  System information attached.

  Also happens under Arch Linux and Fedora.
  I've talked to another user on IRC who seems to be having the same freezes.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.10
  Package: linux-image-4.18.0-10-generic 4.18.0-10.11
  ProcVersionSignature: Ubuntu 4.18.0-10.11-generic 4.18.12
  Uname: Linux 4.18.0-10-generic x86_64
  ApportVersion: 2.20.10-0ubuntu13
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC1:  dsilva     1213 F.... pulseaudio
   /dev/snd/controlC0:  dsilva     1213 F.... pulseaudio
  CurrentDesktop: XFCE
  Date: Sat Oct 20 09:54:50 2018
  InstallationDate: Installed on 2018-10-20 (0 days ago)
  InstallationMedia: Xubuntu 18.10 "Cosmic Cuttlefish" - Release amd64 
(20181017.2)
  MachineType: Dell Inc. Inspiron 5458
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.18.0-10-generic 
root=/dev/mapper/xubuntu--vg-root ro quiet splash vt.handoff=1
  RelatedPackageVersions:
   linux-restricted-modules-4.18.0-10-generic N/A
   linux-backports-modules-4.18.0-10-generic  N/A
   linux-firmware                             1.175
  RfKill:
   0: phy0: Wireless LAN
    Soft blocked: no
    Hard blocked: no
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 02/02/2018
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: A15
  dmi.board.name: 09WGNT
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 9
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvrA15:bd02/02/2018:svnDellInc.:pnInspiron5458:pvr01:rvnDellInc.:rn09WGNT:rvrA00:cvnDellInc.:ct9:cvr:
  dmi.product.name: Inspiron 5458
  dmi.product.sku: Inspiron 5458
  dmi.product.version: 01
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798961/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to