Dmitry Vyukov <[email protected]> writes: > On Mon, Sep 14, 2020 at 2:15 PM Eric W. Biederman <[email protected]> > wrote: >> >> syzbot <[email protected]> writes: >> >> > Hello, >> > >> > syzbot found the following issue on: >> >> Skimming the code it appears this is a feature not a bug. >> >> The stack_not_used code deliberately reads the unused/unitiailized >> portion of the stack, to see if that part of the stack was used. >> >> Perhaps someone wants to make this play nice with KASAN? >> >> KASAN should be able to provide better information than reading the >> stack to see if it is still zeroed out. >> >> Eric > > Hi Eric, > > Thanks for looking into this. > > There may be something else in play here. Unused parts of the stack > are supposed to have zero shadow. The stack instrumentation code > assumes that. If there is some garbage left in the shadow (like these > "70 07 00 00 77" in this case), then it will lead to very obscure > false positives later (e.g. some out-of-bounds on stack which can't be > explained easily). > If some code does something like "jongjmp", then we should clear the > stack at the point of longjmp. I think we did something similar for > something called jprobles, but jprobes were removed at some point. > > Oh, wait, the reproducer uses /dev/fb. And as far as I understand > /dev/fd smashes kernel memory left and right. So most likely it's some > wild out of bounds write in /dev/fb.
So I am confused. The output in the console does not match the log below. Further the memory addresses in the report don't make a bit of sense. Incrementing by 0x80 and only printing 16 bytes which is 0x10. I am simply responding to the fact that KASAN is complaining about an out of bounds/uniitialized access in stack_not_used. Which seems a legitimate thing to do, but that seems to indicate two debugging primitives are fighting each other. So why we have several very different traces I don't understand. Unless you are right and something is causing corruption. At which point this needs to be delivered to whomever can dig into this. Eric >> > HEAD commit: 729e3d09 Merge tag 'ceph-for-5.9-rc5' of >> > git://github.com/.. >> > git tree: upstream >> > console output: https://syzkaller.appspot.com/x/log.txt?x=170a7cf1900000 >> > kernel config: https://syzkaller.appspot.com/x/.config?x=c61610091f4ca8c4 >> > dashboard link: >> > https://syzkaller.appspot.com/bug?extid=d9ae84069cff753e94bf >> > compiler: gcc (GCC) 10.1.0-syz 20200507 >> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10642545900000 >> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=141f2bed900000 >> > >> > Bisection is inconclusive: the issue happens on the oldest tested release. >> > >> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17b9ffcd900000 >> > final oops: https://syzkaller.appspot.com/x/report.txt?x=1479ffcd900000 >> > console output: https://syzkaller.appspot.com/x/log.txt?x=1079ffcd900000 >> > >> > IMPORTANT: if you fix the issue, please add the following tag to the >> > commit: >> > Reported-by: [email protected] >> > >> > ================================================================== >> > BUG: KASAN: unknown-crash in stack_not_used >> > include/linux/sched/task_stack.h:101 [inline] >> > BUG: KASAN: unknown-crash in check_stack_usage kernel/exit.c:692 [inline] >> > BUG: KASAN: unknown-crash in do_exit+0x24a6/0x29f0 kernel/exit.c:849 >> > Read of size 8 at addr ffffc9000cf30130 by task syz-executor624/10359 >> > >> > CPU: 1 PID: 10359 Comm: syz-executor624 Not tainted 5.9.0-rc4-syzkaller #0 >> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> > Google 01/01/2011 >> > Call Trace: >> > __dump_stack lib/dump_stack.c:77 [inline] >> > dump_stack+0x198/0x1fd lib/dump_stack.c:118 >> > print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383 >> > __kasan_report mm/kasan/report.c:513 [inline] >> > kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 >> > stack_not_used include/linux/sched/task_stack.h:101 [inline] >> > check_stack_usage kernel/exit.c:692 [inline] >> > do_exit+0x24a6/0x29f0 kernel/exit.c:849 >> > do_group_exit+0x125/0x310 kernel/exit.c:903 >> > get_signal+0x428/0x1f00 kernel/signal.c:2757 >> > arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811 >> > exit_to_user_mode_loop kernel/entry/common.c:159 [inline] >> > exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:190 >> > syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 >> > entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> > RIP: 0033:0x446b99 >> > Code: Bad RIP value. >> > RSP: 002b:00007f70f5ed9d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 >> > RAX: 0000000000002878 RBX: 00000000006dbc58 RCX: 0000000000446b99 >> > RDX: 9999999999999999 RSI: 0000000000000000 RDI: 0000020002004ffc >> > RBP: 00000000006dbc50 R08: ffffffffffffffff R09: 0000000000000000 >> > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc5c >> > R13: 00007f70f5ed9d20 R14: 00007f70f5ed9d20 R15: 000000000000002d >> > >> > >> > Memory state around the buggy address: >> > ffffc9000cf30000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> > ffffc9000cf30080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >>ffffc9000cf30100: 00 00 00 00 00 00 70 07 00 00 77 00 00 00 00 00 >> > ^ >> > ffffc9000cf30180: 00 00 70 07 00 00 70 07 00 00 00 00 77 00 70 07 >> > ffffc9000cf30200: 00 70 07 00 77 00 00 00 00 00 70 07 00 00 00 00 >> > ================================================================== >> > >> > >> > --- >> > This report is generated by a bot. It may contain errors. >> > See https://goo.gl/tpsmEJ for more information about syzbot. >> > syzbot engineers can be reached at [email protected]. >> > >> > syzbot will keep track of this issue. See: >> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. >> > For information about bisection process see: >> > https://goo.gl/tpsmEJ#bisection >> > syzbot can test patches for this issue, for details see: >> > https://goo.gl/tpsmEJ#testing-patches

