** Also affects: linux (Arch Linux)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1850238

Title:
  ucsi_ccg 50 second hang while resuming from s2ram with nvidia, recent
  kernels

Status in linux package in Ubuntu:
  Confirmed
Status in linux package in Arch Linux:
  New

Bug description:
  Short version
  =============
  I'm experiencing a 50-second hang each time I resume from a "deep" 
(suspend-to-RAM) sleep.

  It happens with the newer kernel (5.3 series; I'm currently running
  the version from eoan-proposed), but not with the version from the
  Ubuntu 18.04.3 LTS (uname says "5.0.0-31-generic #33~18.04.1-Ubuntu
  SMP").

  [I haven't yet tried to test the mainline builds, nor to find/confirm
  the regression range, as this seems like something that will take me
  another week, and I'm not sure if it would be helpful.]

  I narrowed the problem down to what I believe is a broken USB Type-C
  controller on the NVIDIA GPU: the ucsi_ccg driver for
  /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/i2c-0/0-0008 reports
  a timeout for both the initial PPM_RESET command (on system startup)
  and for the SET_NOTIFICATION_ENABLE command the driver runs on resume.

  I guess the hang is the driver waiting for a response to
  SET_NOTIFICATION_ENABLE; it appears to have been added recently in
  
https://github.com/torvalds/linux/commit/a94ecde41f7e51e2742e53b5f151aee662c54d39,
  which could explain why I don't see the hang with 5.0.x.

  Creating /etc/modprobe.d/dell.conf with a `blacklist ucsi_ccg` line
  (and rebooting) makes the hang go away.


  Steps to reproduce
  ==================
  (these are not the actual steps one can take to reproduce, starting from a 
new install; let me know if those will be useful)

  1. Boot Ubuntu 19.10 with NVIDIA GPU drivers uninstalled and the following 
kernel parameters 
<https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter>:
        nouveau.modeset=0 nouveau.runpm=0   # force using integrated graphics
                                              # (the problem can be reproduced 
using NVIDIA's proprietary driver too, but I
                                              # guessed it's better to avoid 
it, and nouveau prints lots of errors with this GPU)
        mem_sleep_default=deep              # suspend to RAM; suspend-to-idle 
has its own problems on this system

  2. Run `dmesg -w` and wait a minute or two until a message like the
  following is printed:

        [  175.611346] ucsi_ccg 0-0008: failed to reset PPM!
        [  175.611355] ucsi_ccg 0-0008: PPM init failed (-110)

  (attempting to suspend before the PPM init timeout will fail to enter sleep 
at all.)
  (if your system doesn't report PPM init timeout, you probably won't see the 
hang on resume either)

  3. Run `sudo pm-suspend` (using the power button to suspend causes
  other problems)

  ...wait for the laptop to go to sleep and the fans to turn off.

  4. Press Enter on the built-in keyboard to resume. (Although the way
  we wake up the system doesn't seem to matter.)

  5. Observe a hang lasting for almost a minute before the system is
  operational, with dmesg reporting:

        [  299.331393] ata1.00: configured for UDMA/100
        <note the 47 second long gap>

        [  346.133024] ucsi_ccg 0-0008: PPM NOT RESPONDING
        [  346.133039] PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 
[ucsi_ccg] returns -110
        [  346.133042] PM: Device 0-0008 failed to resume: error -110
        ...
        [  346.141504] Restarting tasks ... done.
        [  346.340221] PM: suspend exit

  
  System info
  ===========

  My Dell G3 3590 laptop has an NVIDIA "GeForce GTX 1660 Ti with Max-Q Design" 
GPU.
  NVIDIA's "Turing" chips include USB Type-C controller on the GPU (I read 
future VR headsets are supposed to use it 
<https://github.com/envytools/envytools/search?q=4d151a19358579c77487ea3f72c32dc97c0250f7..ffd2dc9146482a5469209bbc861ed80adb066d31&type=Commits>),
 and indeed I'm seeing:

  # lspci -tv
  -[0000:00]-+-00.0  Intel Corporation 8th Gen Core Processor Host Bridge/DRAM 
Registers
             +-01.0-[01]--+-00.0  NVIDIA Corporation TU116M [GeForce GTX 1660 
Ti Mobile]
             |            +-00.1  NVIDIA Corporation Device 1aeb
             |            +-00.2  NVIDIA Corporation Device 1aec
             |            \-00.3  NVIDIA Corporation Device 1aed
  ...

  Where the '1aed' device is detected as "NVIDIA USB Type-C Port Policy
  Controller" in Windows.

  I'm not sure if it's serving any useful purpose on this laptop, and it
  certainly doesn't seem to function properly:

  If I enable UCSI logging on startup (root's crontab):

          @reboot bash -c 'echo 1 >
  /sys/kernel/debug/tracing/events/ucsi/enable'

  ..the steps to reproduce above result in the following 
/sys/kernel/debug/tracing/trace:
  # tracer: nop
  #
  # entries-in-buffer/entries-written: 10/10   #P:12
  #
  #                              _-----=> irqs-off
  #                             / _----=> need-resched
  #                            | / _---=> hardirq/softirq
  #                            || / _--=> preempt-depth
  #                            ||| /     delay
  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
  #              | |       |   ||||       |         |
       kworker/6:2-679   [006] ....    68.593915: ucsi_command: 
control=00000001 (PPM_RESET)
       kworker/6:1-187   [006] ....   151.599387: ucsi_notify: CCI=00000000 
       kworker/6:2-679   [006] ....   175.617158: ucsi_reset_ppm: PPM_RESET -> 
FAIL (err=-110)
       kworker/6:1-187   [006] ....   211.582572: ucsi_notify: CCI=00000000 
       kworker/6:1-187   [006] ....   253.577823: ucsi_notify: CCI=00000000 
       kworker/6:1-187   [006] ....   295.574520: ucsi_notify: CCI=00000000 
        pm-suspend-3448  [007] ....   298.115894: ucsi_command: 
control=dbe70005 (SET_NOTIFICATION_ENABLE)
        pm-suspend-3448  [005] ....   346.138850: ucsi_run_command: 
SET_NOTIFICATION_ENABLE -> FAIL (err=-110)
       kworker/6:1-187   [006] ....   370.904651: ucsi_notify: CCI=00000000 
       kworker/6:1-187   [006] ....   412.901709: ucsi_notify: CCI=00000000 

  I updated the BIOS to the latest available (08/28/2019) and installed
  (by booting into Windows) all the other updates available for this
  system from the vendor. I don't know how to check what is the firmware
  version of the USB-C chip on the GPU and whether it even exists...

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-20-generic 5.3.0-20.21
  ProcVersionSignature: Ubuntu 5.3.0-20.21-generic 5.3.7
  Uname: Linux 5.3.0-20-generic x86_64
  ApportVersion: 2.20.11-0ubuntu8
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC1:  nickolay   1668 F.... pulseaudio
   /dev/snd/controlC0:  nickolay   1668 F.... pulseaudio
  CurrentDesktop: ubuntu:GNOME
  Date: Tue Oct 29 01:21:28 2019
  InstallationDate: Installed on 2019-10-20 (8 days ago)
  InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
  MachineType: Dell Inc. G3 3590
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-20-generic 
root=UUID=0b40d72f-d832-47f6-ab77-faccfb6547fe ro nouveau.modeset=0 
nouveau.runpm=0 mem_sleep_default=deep quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-20-generic N/A
   linux-backports-modules-5.3.0-20-generic  N/A
   linux-firmware                            1.183.1
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/28/2019
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 1.7.1
  dmi.board.name: 061RYD
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A00
  dmi.chassis.type: 10
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr1.7.1:bd08/28/2019:svnDellInc.:pnG33590:pvr:rvnDellInc.:rn061RYD:rvrA00:cvnDellInc.:ct10:cvr:
  dmi.product.family: GSeries
  dmi.product.name: G3 3590
  dmi.product.sku: 0949
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1850238/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to