It's not possible to attach logs due to the nature of the issue. the
instances are completely isolated when it happens and i can only
retrieve logs from AWS virtual serial console.

** Changed in: linux (Ubuntu)
       Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2022329

Title:
  EBS volume attachment during boot cause randomly EC2 instance to be
  stuck

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  We create and deploy custom AMIs based on Ubuntu Jammy and we noticed
  since jammy-20230428 that randomly all the AMI based on it sometimes
  fail during the boot process.  I can destroy and deploy again to get
  rid of this. The stack trace is always the same:

  ```
  [  849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
  [  849.774999]       Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
  [  849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  849.811223] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 
flags:0x00004000
  [  849.883494] Call Trace:
  [  849.891369]  <TASK>
  [  849.899306]  __schedule+0x254/0x5a0
  [  849.907878]  schedule+0x5d/0x100
  [  849.917136]  io_schedule+0x46/0x80
  [  849.970890]  blk_mq_get_tag+0x117/0x300
  [  849.976136]  ? destroy_sched_domains_rcu+0x40/0x40
  [  849.981442]  __blk_mq_alloc_requests+0xc4/0x1e0
  [  849.986750]  blk_mq_get_new_requests+0xcc/0x190
  [  849.992185]  blk_mq_submit_bio+0x1eb/0x450
  [  850.070689]  __submit_bio+0xf6/0x190
  [  850.075545]  submit_bio_noacct_nocheck+0xc2/0x120
  [  850.080841]  submit_bio_noacct+0x209/0x560
  [  850.085654]  submit_bio+0x40/0xf0
  [  850.090361]  submit_bh_wbc+0x134/0x170
  [  850.094905]  ll_rw_block+0xbc/0xd0
  [  850.175198]  do_readahead.isra.0+0x126/0x1e0
  [  850.183531]  jread+0xeb/0x100
  [  850.189648]  do_one_pass+0xbb/0xb90
  [  850.193917]  ? crypto_create_tfm_node+0x9a/0x120
  [  850.207511]  ? crc_43+0x1e/0x1e
  [  850.211887]  jbd2_journal_recover+0x8d/0x150
  [  850.272927]  jbd2_journal_load+0x130/0x1f0
  [  850.280601]  ext4_load_journal+0x271/0x5d0
  [  850.288540]  __ext4_fill_super+0x2aa1/0x2e10
  [  850.296290]  ? pointer+0x36f/0x500
  [  850.304910]  ext4_fill_super+0xd3/0x280
  [  850.372470]  ? ext4_fill_super+0xd3/0x280
  [  850.380637]  get_tree_bdev+0x189/0x280
  [  850.384398]  ? __ext4_fill_super+0x2e10/0x2e10
  [  850.388490]  ext4_get_tree+0x15/0x20
  [  850.392123]  vfs_get_tree+0x2a/0xd0
  [  850.395859]  do_new_mount+0x184/0x2e0
  [  850.468151]  path_mount+0x1f3/0x890
  [  850.471804]  ? putname+0x5f/0x80
  [  850.475341]  init_mount+0x5e/0x9f
  [  850.478976]  do_mount_root+0x8d/0x124
  [  850.482626]  mount_block_root+0xd8/0x1ea
  [  850.486368]  mount_root+0x62/0x6e
  [  850.568079]  prepare_namespace+0x13f/0x19e
  [  850.571984]  kernel_init_freeable+0x120/0x139
  [  850.575930]  ? rest_init+0xe0/0xe0
  [  850.579511]  kernel_init+0x1b/0x170
  [  850.583084]  ? rest_init+0xe0/0xe0
  [  850.586642]  ret_from_fork+0x22/0x30
  [  850.668205]  </TASK>
  ```
  This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws.

  I can easily reproduce in my dev environment where 15 EC2 instances
  are deployed  at once, 1 or 2 of them randomly fails, next deployment
  will succeed.

  I cannot debug more because of lacking connection, I could copy the
  trace above thanks to the virtual serial console. If there are some
  boot options I could add in order to gives more information I will
  happy to do that.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2022329/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to