Public bug reported:

The problem occurs whilst sequentially rebooting the Ubuntu20.04 freshly 
installed inside the VirtualBox.
Host os:
```
No LSB modules are available.
Description:    Ubuntu 23.04
Release:        23.04
```
Guest os:
```
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
```
VirtualBox version:
```
VirtualBox Graphical User Interface Version 7.0.6_Ubuntu r155176
7.0.6_Ubuntur155176
```
Steps to reproduce:
reboot the virtual machine several times via the `reboot`  command.

Actual behavior:
At a certain reboot, the Ubuntu gets stuck with the last message being 
`[    0.000000] Linux agpgart interface v0.103`. After that the machine gets 
stuck. Before that there are significant discrepancies in timestamp. 

Expected behavior:
Normal boot. 


The issue consistently reproduces on different VirtualBox versions (from 
6.1.28r147628 to 7.0) and on 
different host kernel versions, also tried 5.16 and 5.19 kernels. 


The issue persists after changing various VirtualBox VM options. Disabling 
sound control, disabling graphics, disabling PAE/NX, VT/X nested virtualization 
and KVM nested paging does not make the issue go away. Enabling/disabling 
serial port does not influence the issue. Changing the VirtualBox System/Enable 
Hardware Clock in UTC time to unchecked does not help too. 
Whilst being stuck the kernel inside the VM ignores sysrq sent via:
```
VM="Ubuntu"
PRESS="26"
RELEASE=$(printf "%X\n" $((0x${PRESS} + 0x80)))
VBoxManage controlvm "$VM" keyboardputscancode 1d 38 54 "${PRESS}" "${RELEASE}" 
d4 b8 9d
```
(All the other sysrq interrupts were tried, in order to make sure I have double 
checked that the sysrq was enabled, and sent all sysrq's to the normally booted 
VM, and it did work). 

Kernel rebuilt with debug symbols does not affect the problem. Changing the 
verbosity level did not yield any additional results. Enabling/disabling KASLR 
and ASLR did not yield any results. Changing the clocksource to kvm-clock , tsc 
and acpi_pm did not fix the problem. 
SIGNIFICANT NOTE: The problem goes away in case of > 2 CPU and 1GB ram. 
Attaching kdb/kgdb does not seem possible since the sysrq is being ignored.


The issue does reproduce for other users: 
https://www.reddit.com/r/linuxquestions/comments/ols6f1/ubuntu_server_2004_boot_suddenly_it_freezes_at/


After a bit of printk I have managed to find the source of the problem and take 
a stack trace with the VirtualBox debugger. The problem is the 
`blk_mq_freeze_queue_wait` being stuck at the loop driver initialization. 
```
#  RBP              Ret SS:RBP            Ret RIP          CS:RIP / Symbol 
[line]
00 ffffc90000013c80 0000:ffffc90000013c90 ffffffff81aaf4ba 
kallsyms!blk_mq_freeze_queue_wait
   retn/64
01 ffffc90000013c90 0000:ffffc90000013cc0 ffffffff81ab0d47 
kallsyms!blk_mq_freeze_queue+e
   retn/64
02 ffffc90000013cc0 0000:ffffc90000013ce0 ffffffff81ab0f29 kallsyms!wbt_init+1af
   retn/64
03 ffffc90000013ce0 0000:ffffc90000013d28 ffffffff81aaef13 
kallsyms!wbt_enable_default+b6
   retn/64
04 ffffc90000013d28 0000:ffffc90000013d90 ffffffff814f2760 
kallsyms!blk_register_queue+358
   retn/64
05 ffffc90000013d90 0000:ffffc90000013da0 ffffffff814f27a3 
kallsyms!__device_add_disk+450
   retn/64
06 ffffc90000013da0 0000:ffffc90000013dd8 ffffffff81aca41c 
kallsyms!device_add_disk+13
   retn/64
07 ffffc90000013dd8 0000:ffffc90000013e10 ffffffff82d0f072 kallsyms!loop_add+327
   retn/64
08 ffffc90000013e10 0000:ffffc90000013e88 ffffffff810037da 
kallsyms!loop_init+134
   retn/64
09 ffffc90000013e88 0000:ffffc90000013f38 ffffffff82ca240c 
kallsyms!do_one_initcall+4a
   retn/64
0a ffffc90000013f38 0000:ffffc90000013f48 ffffffff81aedd6e 
kallsyms!kernel_init_freeable+1e6
   retn/64
0b ffffc90000013f48 0000:0000000000000000 ffffffff81c00255 
kallsyms!kernel_init+e
   retn/64
0c 0000000000000000 0000:0000000000000000 0000000000000000 
kallsyms!ret_from_fork+35
```
blk_mq_freeze_queue_wait just calls the `wait_event` for the 
`percpu_ref_is_zero(&q->q_usage_counter)`, which is not being zero before the 
wait. 
Inside the wait_event macro, prepare_to_wait_event function is reached, 
finish_wait is not reached. (Checked with the debuggers breakpoints). 

Thank you for your efforts in addressing this matter. I appreciate your
attention and assistance in resolving the issue. If there is any
additional information or logs that would be helpful in investigating
the problem, please let me know. I have confidence in your expertise and
appreciate your support. Best regards.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: kernel-bug

** Attachment added: "Ubuntu boot log file"
   
https://bugs.launchpad.net/bugs/2022097/+attachment/5677206/+files/ubuntu_dmesg.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2022097

Title:
  Ubuntu 20.04 Stuck at reboot in Virtualbox kernel version
  5.4.0-149-generic #166 SMP

Status in linux package in Ubuntu:
  New

Bug description:
  The problem occurs whilst sequentially rebooting the Ubuntu20.04 freshly 
installed inside the VirtualBox.
  Host os:
  ```
  No LSB modules are available.
  Description:  Ubuntu 23.04
  Release:      23.04
  ```
  Guest os:
  ```
  Description:  Ubuntu 20.04.6 LTS
  Release:      20.04
  ```
  VirtualBox version:
  ```
  VirtualBox Graphical User Interface Version 7.0.6_Ubuntu r155176
  7.0.6_Ubuntur155176
  ```
  Steps to reproduce:
  reboot the virtual machine several times via the `reboot`  command.

  Actual behavior:
  At a certain reboot, the Ubuntu gets stuck with the last message being 
  `[    0.000000] Linux agpgart interface v0.103`. After that the machine gets 
stuck. Before that there are significant discrepancies in timestamp. 

  Expected behavior:
  Normal boot. 

  
  The issue consistently reproduces on different VirtualBox versions (from 
6.1.28r147628 to 7.0) and on 
  different host kernel versions, also tried 5.16 and 5.19 kernels. 

  
  The issue persists after changing various VirtualBox VM options. Disabling 
sound control, disabling graphics, disabling PAE/NX, VT/X nested virtualization 
and KVM nested paging does not make the issue go away. Enabling/disabling 
serial port does not influence the issue. Changing the VirtualBox System/Enable 
Hardware Clock in UTC time to unchecked does not help too. 
  Whilst being stuck the kernel inside the VM ignores sysrq sent via:
  ```
  VM="Ubuntu"
  PRESS="26"
  RELEASE=$(printf "%X\n" $((0x${PRESS} + 0x80)))
  VBoxManage controlvm "$VM" keyboardputscancode 1d 38 54 "${PRESS}" 
"${RELEASE}" d4 b8 9d
  ```
  (All the other sysrq interrupts were tried, in order to make sure I have 
double checked that the sysrq was enabled, and sent all sysrq's to the normally 
booted VM, and it did work). 

  Kernel rebuilt with debug symbols does not affect the problem. Changing the 
verbosity level did not yield any additional results. Enabling/disabling KASLR 
and ASLR did not yield any results. Changing the clocksource to kvm-clock , tsc 
and acpi_pm did not fix the problem. 
  SIGNIFICANT NOTE: The problem goes away in case of > 2 CPU and 1GB ram. 
Attaching kdb/kgdb does not seem possible since the sysrq is being ignored.

  
  The issue does reproduce for other users: 
https://www.reddit.com/r/linuxquestions/comments/ols6f1/ubuntu_server_2004_boot_suddenly_it_freezes_at/

  
  After a bit of printk I have managed to find the source of the problem and 
take a stack trace with the VirtualBox debugger. The problem is the 
`blk_mq_freeze_queue_wait` being stuck at the loop driver initialization. 
  ```
  #  RBP              Ret SS:RBP            Ret RIP          CS:RIP / Symbol 
[line]
  00 ffffc90000013c80 0000:ffffc90000013c90 ffffffff81aaf4ba 
kallsyms!blk_mq_freeze_queue_wait
     retn/64
  01 ffffc90000013c90 0000:ffffc90000013cc0 ffffffff81ab0d47 
kallsyms!blk_mq_freeze_queue+e
     retn/64
  02 ffffc90000013cc0 0000:ffffc90000013ce0 ffffffff81ab0f29 
kallsyms!wbt_init+1af
     retn/64
  03 ffffc90000013ce0 0000:ffffc90000013d28 ffffffff81aaef13 
kallsyms!wbt_enable_default+b6
     retn/64
  04 ffffc90000013d28 0000:ffffc90000013d90 ffffffff814f2760 
kallsyms!blk_register_queue+358
     retn/64
  05 ffffc90000013d90 0000:ffffc90000013da0 ffffffff814f27a3 
kallsyms!__device_add_disk+450
     retn/64
  06 ffffc90000013da0 0000:ffffc90000013dd8 ffffffff81aca41c 
kallsyms!device_add_disk+13
     retn/64
  07 ffffc90000013dd8 0000:ffffc90000013e10 ffffffff82d0f072 
kallsyms!loop_add+327
     retn/64
  08 ffffc90000013e10 0000:ffffc90000013e88 ffffffff810037da 
kallsyms!loop_init+134
     retn/64
  09 ffffc90000013e88 0000:ffffc90000013f38 ffffffff82ca240c 
kallsyms!do_one_initcall+4a
     retn/64
  0a ffffc90000013f38 0000:ffffc90000013f48 ffffffff81aedd6e 
kallsyms!kernel_init_freeable+1e6
     retn/64
  0b ffffc90000013f48 0000:0000000000000000 ffffffff81c00255 
kallsyms!kernel_init+e
     retn/64
  0c 0000000000000000 0000:0000000000000000 0000000000000000 
kallsyms!ret_from_fork+35
  ```
  blk_mq_freeze_queue_wait just calls the `wait_event` for the 
`percpu_ref_is_zero(&q->q_usage_counter)`, which is not being zero before the 
wait. 
  Inside the wait_event macro, prepare_to_wait_event function is reached, 
finish_wait is not reached. (Checked with the debuggers breakpoints). 

  Thank you for your efforts in addressing this matter. I appreciate
  your attention and assistance in resolving the issue. If there is any
  additional information or logs that would be helpful in investigating
  the problem, please let me know. I have confidence in your expertise
  and appreciate your support. Best regards.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2022097/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to