Public bug reported:
We create and deploy custom AMIs based on Ubuntu Jammy and we noticed
since jammy-20230428 that randomly all the AMI based on it sometimes
fail during the boot process. I can destroy and deploy again to get rid
of this. The stack trace is always the same:
```
[ 849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[ 849.774999] Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[ 849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 849.811223] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0
flags:0x00004000
[ 849.883494] Call Trace:
[ 849.891369] <TASK>
[ 849.899306] __schedule+0x254/0x5a0
[ 849.907878] schedule+0x5d/0x100
[ 849.917136] io_schedule+0x46/0x80
[ 849.970890] blk_mq_get_tag+0x117/0x300
[ 849.976136] ? destroy_sched_domains_rcu+0x40/0x40
[ 849.981442] __blk_mq_alloc_requests+0xc4/0x1e0
[ 849.986750] blk_mq_get_new_requests+0xcc/0x190
[ 849.992185] blk_mq_submit_bio+0x1eb/0x450
[ 850.070689] __submit_bio+0xf6/0x190
[ 850.075545] submit_bio_noacct_nocheck+0xc2/0x120
[ 850.080841] submit_bio_noacct+0x209/0x560
[ 850.085654] submit_bio+0x40/0xf0
[ 850.090361] submit_bh_wbc+0x134/0x170
[ 850.094905] ll_rw_block+0xbc/0xd0
[ 850.175198] do_readahead.isra.0+0x126/0x1e0
[ 850.183531] jread+0xeb/0x100
[ 850.189648] do_one_pass+0xbb/0xb90
[ 850.193917] ? crypto_create_tfm_node+0x9a/0x120
[ 850.207511] ? crc_43+0x1e/0x1e
[ 850.211887] jbd2_journal_recover+0x8d/0x150
[ 850.272927] jbd2_journal_load+0x130/0x1f0
[ 850.280601] ext4_load_journal+0x271/0x5d0
[ 850.288540] __ext4_fill_super+0x2aa1/0x2e10
[ 850.296290] ? pointer+0x36f/0x500
[ 850.304910] ext4_fill_super+0xd3/0x280
[ 850.372470] ? ext4_fill_super+0xd3/0x280
[ 850.380637] get_tree_bdev+0x189/0x280
[ 850.384398] ? __ext4_fill_super+0x2e10/0x2e10
[ 850.388490] ext4_get_tree+0x15/0x20
[ 850.392123] vfs_get_tree+0x2a/0xd0
[ 850.395859] do_new_mount+0x184/0x2e0
[ 850.468151] path_mount+0x1f3/0x890
[ 850.471804] ? putname+0x5f/0x80
[ 850.475341] init_mount+0x5e/0x9f
[ 850.478976] do_mount_root+0x8d/0x124
[ 850.482626] mount_block_root+0xd8/0x1ea
[ 850.486368] mount_root+0x62/0x6e
[ 850.568079] prepare_namespace+0x13f/0x19e
[ 850.571984] kernel_init_freeable+0x120/0x139
[ 850.575930] ? rest_init+0xe0/0xe0
[ 850.579511] kernel_init+0x1b/0x170
[ 850.583084] ? rest_init+0xe0/0xe0
[ 850.586642] ret_from_fork+0x22/0x30
[ 850.668205] </TASK>
```
This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws.
I can easily reproduce in my dev environment where 15 EC2 instances are
deployed at once, 1 or 2 of them randomly fails, next deployment will
succeed.
I cannot debug more because of lacking connection, I could copy the
trace above thanks to the virtual serial console. If there are some boot
options I could add in order to gives more information I will happy to
do that.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: Incomplete
** Tags: aws ebs jammy ubuntu
** Description changed:
We create and deploy custom AMIs based on Ubuntu Jammy and we noticed
since jammy-20230428 that randomly all the AMI based on it sometimes
fail during the boot process. I can destroy and deploy again to get rid
of this. The stack trace is always the same:
```
[ 849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[ 849.774999] Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[ 849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 849.811223] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0
flags:0x00004000
[ 849.883494] Call Trace:
[ 849.891369] <TASK>
[ 849.899306] __schedule+0x254/0x5a0
[ 849.907878] schedule+0x5d/0x100
[ 849.917136] io_schedule+0x46/0x80
[ 849.970890] blk_mq_get_tag+0x117/0x300
[ 849.976136] ? destroy_sched_domains_rcu+0x40/0x40
[ 849.981442] __blk_mq_alloc_requests+0xc4/0x1e0
[ 849.986750] blk_mq_get_new_requests+0xcc/0x190
[ 849.992185] blk_mq_submit_bio+0x1eb/0x450
[ 850.070689] __submit_bio+0xf6/0x190
[ 850.075545] submit_bio_noacct_nocheck+0xc2/0x120
[ 850.080841] submit_bio_noacct+0x209/0x560
[ 850.085654] submit_bio+0x40/0xf0
[ 850.090361] submit_bh_wbc+0x134/0x170
[ 850.094905] ll_rw_block+0xbc/0xd0
[ 850.175198] do_readahead.isra.0+0x126/0x1e0
[ 850.183531] jread+0xeb/0x100
[ 850.189648] do_one_pass+0xbb/0xb90
[ 850.193917] ? crypto_create_tfm_node+0x9a/0x120
[ 850.207511] ? crc_43+0x1e/0x1e
[ 850.211887] jbd2_journal_recover+0x8d/0x150
[ 850.272927] jbd2_journal_load+0x130/0x1f0
[ 850.280601] ext4_load_journal+0x271/0x5d0
[ 850.288540] __ext4_fill_super+0x2aa1/0x2e10
[ 850.296290] ? pointer+0x36f/0x500
[ 850.304910] ext4_fill_super+0xd3/0x280
[ 850.372470] ? ext4_fill_super+0xd3/0x280
[ 850.380637] get_tree_bdev+0x189/0x280
[ 850.384398] ? __ext4_fill_super+0x2e10/0x2e10
[ 850.388490] ext4_get_tree+0x15/0x20
[ 850.392123] vfs_get_tree+0x2a/0xd0
[ 850.395859] do_new_mount+0x184/0x2e0
[ 850.468151] path_mount+0x1f3/0x890
[ 850.471804] ? putname+0x5f/0x80
[ 850.475341] init_mount+0x5e/0x9f
[ 850.478976] do_mount_root+0x8d/0x124
[ 850.482626] mount_block_root+0xd8/0x1ea
[ 850.486368] mount_root+0x62/0x6e
[ 850.568079] prepare_namespace+0x13f/0x19e
[ 850.571984] kernel_init_freeable+0x120/0x139
[ 850.575930] ? rest_init+0xe0/0xe0
[ 850.579511] kernel_init+0x1b/0x170
[ 850.583084] ? rest_init+0xe0/0xe0
[ 850.586642] ret_from_fork+0x22/0x30
[ 850.668205] </TASK>
```
- This happens since 5.19.0-1024-aws, I have now rolled back to
5.19.0-1022-aws.
+ This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws.
+ I can easily reproduce in my dev environment where 15 EC2 instances are
+ deployed at once, 1 or 2 of them randomly fails, next deployment will
+ succeed.
- I cannot debug more because of lacking connection, I could copy the trace
above thanks to the virtual serial console. If there are some boot options I
could add in order to gives more information I will happy to do that.
+ I cannot debug more because of lacking connection, I could copy the
+ trace above thanks to the virtual serial console. If there are some boot
+ options I could add in order to gives more information I will happy to
+ do that.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2022329
Title:
EBS volume attachment during boot cause randomly EC2 instance to be
stuck
Status in linux package in Ubuntu:
Incomplete
Bug description:
We create and deploy custom AMIs based on Ubuntu Jammy and we noticed
since jammy-20230428 that randomly all the AMI based on it sometimes
fail during the boot process. I can destroy and deploy again to get
rid of this. The stack trace is always the same:
```
[ 849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[ 849.774999] Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[ 849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 849.811223] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0
flags:0x00004000
[ 849.883494] Call Trace:
[ 849.891369] <TASK>
[ 849.899306] __schedule+0x254/0x5a0
[ 849.907878] schedule+0x5d/0x100
[ 849.917136] io_schedule+0x46/0x80
[ 849.970890] blk_mq_get_tag+0x117/0x300
[ 849.976136] ? destroy_sched_domains_rcu+0x40/0x40
[ 849.981442] __blk_mq_alloc_requests+0xc4/0x1e0
[ 849.986750] blk_mq_get_new_requests+0xcc/0x190
[ 849.992185] blk_mq_submit_bio+0x1eb/0x450
[ 850.070689] __submit_bio+0xf6/0x190
[ 850.075545] submit_bio_noacct_nocheck+0xc2/0x120
[ 850.080841] submit_bio_noacct+0x209/0x560
[ 850.085654] submit_bio+0x40/0xf0
[ 850.090361] submit_bh_wbc+0x134/0x170
[ 850.094905] ll_rw_block+0xbc/0xd0
[ 850.175198] do_readahead.isra.0+0x126/0x1e0
[ 850.183531] jread+0xeb/0x100
[ 850.189648] do_one_pass+0xbb/0xb90
[ 850.193917] ? crypto_create_tfm_node+0x9a/0x120
[ 850.207511] ? crc_43+0x1e/0x1e
[ 850.211887] jbd2_journal_recover+0x8d/0x150
[ 850.272927] jbd2_journal_load+0x130/0x1f0
[ 850.280601] ext4_load_journal+0x271/0x5d0
[ 850.288540] __ext4_fill_super+0x2aa1/0x2e10
[ 850.296290] ? pointer+0x36f/0x500
[ 850.304910] ext4_fill_super+0xd3/0x280
[ 850.372470] ? ext4_fill_super+0xd3/0x280
[ 850.380637] get_tree_bdev+0x189/0x280
[ 850.384398] ? __ext4_fill_super+0x2e10/0x2e10
[ 850.388490] ext4_get_tree+0x15/0x20
[ 850.392123] vfs_get_tree+0x2a/0xd0
[ 850.395859] do_new_mount+0x184/0x2e0
[ 850.468151] path_mount+0x1f3/0x890
[ 850.471804] ? putname+0x5f/0x80
[ 850.475341] init_mount+0x5e/0x9f
[ 850.478976] do_mount_root+0x8d/0x124
[ 850.482626] mount_block_root+0xd8/0x1ea
[ 850.486368] mount_root+0x62/0x6e
[ 850.568079] prepare_namespace+0x13f/0x19e
[ 850.571984] kernel_init_freeable+0x120/0x139
[ 850.575930] ? rest_init+0xe0/0xe0
[ 850.579511] kernel_init+0x1b/0x170
[ 850.583084] ? rest_init+0xe0/0xe0
[ 850.586642] ret_from_fork+0x22/0x30
[ 850.668205] </TASK>
```
This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws.
I can easily reproduce in my dev environment where 15 EC2 instances
are deployed at once, 1 or 2 of them randomly fails, next deployment
will succeed.
I cannot debug more because of lacking connection, I could copy the
trace above thanks to the virtual serial console. If there are some
boot options I could add in order to gives more information I will
happy to do that.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2022329/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp