Public bug reported:

This might not be a bug really about linux-gcp, but this is following
the work on bug 2039732 and so far I could not reproduce that locally.

Setup is 22.04 uptodate on GCP n2-standard instances, without GPU
attached thus relying on vkms. I have reproduced locally a similar setup
but on a KVM host.

We rely on 
https://github.com/taskcluster/taskcluster/tree/main/workers/generic-worker#readme
 to run tasks on CI, and especially generic-worker will:
 - create a new task_XXX user
 - make it autologin in gdm3 config
 - generic-worker has code to probe for existence of the GNOME Wayland session 
before launching the task

We relied on wl-clipboard package installed for verifying the status of
wayland

On top of that setup, here is the issue.

We issue a TC task with payload:
> export WAYLAND_DISPLAY=wayland-0
> export XDG_RUNTIME_DIR=/run/user/$(id -u)
> wl-paste -l -p

We expect that payload to report "No selection", but on GCP instances we
mostly always end up with "This seat has no keyboard". There were also
cases were the session would not be Wayland at all but rather X11. I
think this suggests something around the availability of /dev/dri/card0,
but forcing the gdm3 service to wait for its availability and adding
extra waiting time after card0 is present would still not get us
somewhere.

We enabled gdm3 as well as mutter debugging but never found anything
that would be a good lead on why it was not yet ready.

At some point, the seat0 session of our user was shown as inactive and
the active one was tied to gdm so we suspected this was the reason, but
both forcing the session to be active and terminating the gdm session
would still not unblock us.

We also suspected the desktop to be locking itself so we disabled locking with 
the following, but iit did not help much:
> cat > /etc/dconf/profile/user << EOF
> user-db:user
> system-db:local
> EOF
> 
> mkdir /etc/dconf/db/local.d/
> # dconf user settings
> cat > /etc/dconf/db/local.d/00-tc-gnome-settings << EOF
> # /org/gnome/desktop/session/idle-delay
> [org/gnome/desktop/session]
> idle-delay=uint32 0
> # /org/gnome/desktop/lockdown/disable-lock-screen
> [org/gnome/desktop/lockdown]
> disable-lock-screen=true
> EOF
> 
> sudo dconf update


In the end, the only viable and reliable (verified over hundreds of runs now) 
fix that lasted was to add a "/bin/sleep 30" all to the gdm3 startup:
> mkdir -p /etc/systemd/system/gdm.service.d/
> cat > /etc/systemd/system/gdm.service.d/gdm-wait.conf << EOF
> [Unit]
> Description=Extra 30s wait
> [Service]
> ExecStartPre=/bin/sleep 30
> EOF

** Affects: linux-gcp (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-gcp in Ubuntu.
https://bugs.launchpad.net/bugs/2062534

Title:
  GDM3 autologin might be racy on GCP resulting in inconsistent state of
  the wayland setup of seat0

Status in linux-gcp package in Ubuntu:
  New

Bug description:
  This might not be a bug really about linux-gcp, but this is following
  the work on bug 2039732 and so far I could not reproduce that locally.

  Setup is 22.04 uptodate on GCP n2-standard instances, without GPU
  attached thus relying on vkms. I have reproduced locally a similar
  setup but on a KVM host.

  We rely on 
https://github.com/taskcluster/taskcluster/tree/main/workers/generic-worker#readme
 to run tasks on CI, and especially generic-worker will:
   - create a new task_XXX user
   - make it autologin in gdm3 config
   - generic-worker has code to probe for existence of the GNOME Wayland 
session before launching the task

  We relied on wl-clipboard package installed for verifying the status
  of wayland

  On top of that setup, here is the issue.

  We issue a TC task with payload:
  > export WAYLAND_DISPLAY=wayland-0
  > export XDG_RUNTIME_DIR=/run/user/$(id -u)
  > wl-paste -l -p

  We expect that payload to report "No selection", but on GCP instances
  we mostly always end up with "This seat has no keyboard". There were
  also cases were the session would not be Wayland at all but rather
  X11. I think this suggests something around the availability of
  /dev/dri/card0, but forcing the gdm3 service to wait for its
  availability and adding extra waiting time after card0 is present
  would still not get us somewhere.

  We enabled gdm3 as well as mutter debugging but never found anything
  that would be a good lead on why it was not yet ready.

  At some point, the seat0 session of our user was shown as inactive and
  the active one was tied to gdm so we suspected this was the reason,
  but both forcing the session to be active and terminating the gdm
  session would still not unblock us.

  We also suspected the desktop to be locking itself so we disabled locking 
with the following, but iit did not help much:
  > cat > /etc/dconf/profile/user << EOF
  > user-db:user
  > system-db:local
  > EOF
  > 
  > mkdir /etc/dconf/db/local.d/
  > # dconf user settings
  > cat > /etc/dconf/db/local.d/00-tc-gnome-settings << EOF
  > # /org/gnome/desktop/session/idle-delay
  > [org/gnome/desktop/session]
  > idle-delay=uint32 0
  > # /org/gnome/desktop/lockdown/disable-lock-screen
  > [org/gnome/desktop/lockdown]
  > disable-lock-screen=true
  > EOF
  > 
  > sudo dconf update

  
  In the end, the only viable and reliable (verified over hundreds of runs now) 
fix that lasted was to add a "/bin/sleep 30" all to the gdm3 startup:
  > mkdir -p /etc/systemd/system/gdm.service.d/
  > cat > /etc/systemd/system/gdm.service.d/gdm-wait.conf << EOF
  > [Unit]
  > Description=Extra 30s wait
  > [Service]
  > ExecStartPre=/bin/sleep 30
  > EOF

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-gcp/+bug/2062534/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to