On Thu, Oct 19, 2017 at 03:35:22PM -0700, Andrei Vagin wrote: > On Thu, Oct 19, 2017 at 01:16:55PM -0500, Josh Poimboeuf wrote: > > On Thu, Oct 19, 2017 at 09:51:04AM -0700, Andrei Vagin wrote: > > > Hi, > > > > > > We run CRIU tests for tip/auto-latest regularly, and a few days ago our > > > test job started to detect this warning in a kernel log: > > > > > > [ 44.235786] WARNING: can't dereference iret registers at > > > ffff8801c5f17fe0 for ip ffffffff95f0d94b > > > > > > What does it mean? How critical is it? > > > > > > Our test job fails if it detects any warning in a kernel log. Maybe we > > > need to investigate reasons of this warning and try to eliminate it? > > > > > > Here are logs: > > > https://travis-ci.org/avagin/linux/jobs/289676634 > > > > I think it means the unwinder found some bad ORC unwinder metadata. Any > > chance you have access to the kernel binary? I need to know what code > > corresponds to that ffffffff95f0d94b address. > > > > Or if you can reproduce with the following patch, that should help: > > > > > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c > > index 570b70d3f604..95b633f0ce51 100644 > > --- a/arch/x86/kernel/unwind_orc.c > > +++ b/arch/x86/kernel/unwind_orc.c > > @@ -448,7 +448,7 @@ bool unwind_next_frame(struct unwind_state *state) > > > > case ORC_TYPE_REGS_IRET: > > if (!deref_stack_regs(state, sp, &state->ip, &state->sp, > > false)) { > > - orc_warn("can't dereference iret registers at %p for ip > > %p\n", > > + orc_warn("can't dereference iret registers at %p for ip > > %pB\n", > > (void *)sp, (void *)orig_ip); > > goto done; > > } > > I applied your patch and rerun tests. > > [ 44.947699] WARNING: can't dereference iret registers at ffff880178f5ffe0 > for ip int3+0x5b/0x60 > > and now here is a warning from kasan: > > [ 477.775676] > ================================================================== > [ 477.775845] BUG: KASAN: stack-out-of-bounds in deref_stack_reg+0x11d/0x150 > [ 477.775952] Read of size 8 at addr ffff880166b7fe90 by task make/16028 > [ 477.776055] > [ 477.776149] CPU: 0 PID: 16028 Comm: make Not tainted 4.14.0-rc5+ #1 > [ 477.776152] Hardware name: Google Google Compute Engine/Google Compute > Engine, BIOS Google 01/01/2011 > [ 477.776155] Call Trace: > [ 477.776159] <IRQ> > [ 477.776167] dump_stack+0x5c/0x7e > [ 477.776175] print_address_description+0x6b/0x290 > [ 477.776182] ? deref_stack_reg+0x11d/0x150 > [ 477.776186] kasan_report+0x25d/0x340 > [ 477.776194] deref_stack_reg+0x11d/0x150 > [ 477.776201] ? __read_once_size_nocheck.constprop.6+0x10/0x10 > [ 477.776206] ? get_stack_info+0x37/0x170 > [ 477.776212] ? stack_access_ok+0xdc/0x150 > [ 477.776221] unwind_next_frame+0xe35/0x1c10 > [ 477.776230] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776238] ? deref_stack_reg+0x150/0x150 > [ 477.776247] ? is_bpf_text_address+0x54/0x60 > [ 477.776253] ? kernel_text_address+0xf4/0x100 > [ 477.776257] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776266] __save_stack_trace+0x73/0xd0 > [ 477.776277] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776285] save_stack+0x33/0xb0 > [ 477.776291] ? kasan_slab_free+0x70/0xc0 > [ 477.776298] ? kmem_cache_free+0x9f/0x230 > [ 477.776303] ? rcu_process_callbacks+0x451/0xd60 > [ 477.776307] ? __do_softirq+0x1d3/0x5e0 > [ 477.776312] ? irq_exit+0x146/0x170 > [ 477.776322] ? smp_apic_timer_interrupt+0x13e/0x3b0 > [ 477.776326] ? apic_timer_interrupt+0x8c/0xa0 > [ 477.776331] ? lock_acquire+0x6b/0x260 > [ 477.776336] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776347] ? update_curr+0x2d6/0x600 > [ 477.776354] ? posix_cpu_timers_exit_group+0x50/0x50 > [ 477.776365] ? trigger_load_balance+0x1fd/0x8a0 > [ 477.776374] ? note_gp_changes+0x14e/0x1b0 > [ 477.776384] ? lock_downgrade+0x590/0x590 > [ 477.776389] ? rcu_accelerate_cbs+0x106/0x5e0 > [ 477.776398] ? lock_acquire+0x113/0x260 > [ 477.776402] ? rcu_process_callbacks+0x407/0xd60 > [ 477.776407] kasan_slab_free+0x70/0xc0 > [ 477.776414] ? rcu_process_callbacks+0x451/0xd60 > [ 477.776418] kmem_cache_free+0x9f/0x230 > [ 477.776425] ? free_inode_nonrcu+0x20/0x20 > [ 477.776430] rcu_process_callbacks+0x451/0xd60 > [ 477.776443] ? note_gp_changes+0x1b0/0x1b0 > [ 477.776451] ? native_apic_msr_write+0x27/0x30 > [ 477.776456] ? lapic_next_event+0x55/0x80 > [ 477.776465] __do_softirq+0x1d3/0x5e0 > [ 477.776479] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776483] irq_exit+0x146/0x170 > [ 477.776487] smp_apic_timer_interrupt+0x13e/0x3b0 > [ 477.776494] apic_timer_interrupt+0x8c/0xa0 > [ 477.776497] </IRQ> > [ 477.776502] RIP: 0010:lock_acquire+0x6b/0x260 > [ 477.776505] RSP: 0018:ffff880166b7fd48 EFLAGS: 00000246 ORIG_RAX: > ffffffffffffff11 > [ 477.776512] RAX: 0000000000000007 RBX: ffff8801c91cb080 RCX: > 0000000000000000 > [ 477.776515] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffff8801c91cb8b4 > [ 477.776518] RBP: 0000000000000000 R08: 0000000000000001 R09: > 0000000000000000 > [ 477.776521] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000000 > [ 477.776524] R13: 0000000000000000 R14: 0000000000000000 R15: > ffffffff9f651fce > [ 477.776528] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776552] do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776559] ? fs_reclaim_acquire.part.71+0x29/0x30 > [ 477.776564] ? fs_reclaim_acquire.part.71+0x5/0x30 > [ 477.776569] ? kmem_cache_alloc+0x29/0x1f0 > [ 477.776577] ? do_execveat_common.isra.34+0x78e/0x1890 > [ 477.776589] ? strncpy_from_user+0x74/0x260 > [ 477.776595] ? prepare_bprm_creds+0x100/0x100 > [ 477.776599] ? kmem_cache_alloc+0x18d/0x1f0 > [ 477.776607] ? getname_flags+0xff/0x500 > [ 477.776615] ? SyS_execve+0x2c/0x40 > [ 477.776623] ? ptregs_sys_vfork+0x10/0x10 > [ 477.776628] ? do_syscall_64+0x181/0x450 > [ 477.776638] ? entry_SYSCALL64_slow_path+0x25/0x25 > [ 477.776653] > [ 477.776747] The buggy address belongs to the page: > [ 477.776849] page:ffffea00059adfc0 count:0 mapcount:0 mapping: > (null) index:0x0 > [ 477.776968] flags: 0x17fff8000000000() > [ 477.777067] raw: 017fff8000000000 0000000000000000 0000000000000000 > 00000000ffffffff > [ 477.777184] raw: 0000000000000000 dead000000000200 0000000000000000 > 0000000000000000 > [ 477.777298] page dumped because: kasan: bad access detected > [ 477.777404] > [ 477.777494] Memory state around the buggy address: > [ 477.777594] ffff880166b7fd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [ 477.777709] ffff880166b7fe00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 > f1 00 > [ 477.777823] >ffff880166b7fe80: f4 f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 > 00 00 > [ 477.777937] ^ > [ 477.778034] ffff880166b7ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [ 477.778147] ffff880166b7ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [ 477.778260] > ================================================================== > [ 477.778376] Disabling lock debugging due to kernel taint > > All logs are here https://travis-ci.org/avagin/linux/jobs/290190646 > > Unfortunately vmlinux was not saved for this run. Thanks.
Here is vmlinux: https://www.dropbox.com/s/e70u6oxxj4pwe2h/vmlinux?dl=0