So here's a nasty bug of some sort. I'm in the middle of doing some changes to backtrace, so we can easily output user backtraces. In the process, I started running into an issue where the backtrace wouldn't make progress, and would stick on the second entry in the BT. it doesn't always happen either.
I narrowed it down to these instructions (hacked up a bit, nops and whatnot). ffffffffc2100081: b9 10 00 00 00 mov $0x10,%ecx ffffffffc2100086: 90 nop ffffffffc2100087: 90 nop ffffffffc2100088: 90 nop ffffffffc2100089: 45 31 c0 xor %r8d,%r8d ffffffffc210008c: 4c 89 e7 mov %r12,%rdi ffffffffc210008f: 4c 89 fe mov %r15,%rsi ffffffffc2100092: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) ffffffffc2100094: 90 nop ffffffffc2100095: 90 nop ffffffffc2100096: 90 nop ffffffffc2100097: 4c 8b 4d c0 mov -0x40(%rbp),%r9 ffffffffc210009b: 4d 39 f9 cmp %r15,%r9 ffffffffc210009e: 74 30 je ffffffffc21000d0 <backtrace_user_list+0x80> that last bit is a jump to a while(1) loop, so i can look at things in qemu. the check was whether or not some stack variable changed (which we copied into), its addr is rdi == r12. the value should change (based on the program). and when we look at the state of the machine, it's not clear why it didn't. other than r9 and flags from the cmp, our state should be the same as it was right after the rep movsb. (qemu) info registers RAX=ffff80013eb5adf0 RBX=0000000000000001 RCX=0000000000000000 RDX=ffff80013eb5adf0 RSI=00007f7fffbfef50 RDI=ffff80013eb5ada0 RBP=ffff80013eb5ade0 RSP=ffff80013eb5ad80 R8 =0000000000000000 R9 =00007f7fffbfef50 R10=ffff8000000b8f00 R11=ffff8000000b8ec0 R12=ffff80013eb5ada0 R13=0000000000000014 R14=0000000000401a86 R15=00007f7fffbfef50 RIP=ffffffffc21000d0 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00c00000 CS =0008 0000000000000000 00000000 00209900 DPL=0 CS64 [--A] SS =0000 0000000000000000 ffffffff 00c00000 DS =0000 0000000000000000 ffffffff 00c00000 FS =0000 00004000005d90c0 ffffffff 00c00000 GS =0000 ffffffffc6c8b7c0 ffffffff 00c00000 LDT=0000 0000000000000000 ffffffff 00c00000 TR =0028 ffffffffc6db4380 00000068 00008b00 DPL=0 TSS64-busy GDT= ffff800000101000 00000037 IDT= ffffffffc6c89f10 00000fff CR0=80010033 CR2=000000000061d000 CR3=000000013ecca000 CR4=000007b0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000501 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=0000000000000000ff00000000000000 XMM01=25252525252525252525252525252525 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=0000000000000000ff00000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 XMM08=00000000000000000000000000000000 XMM09=ffffffffffffff00ffffffffffffff00 XMM10=ffffffffffffff00ffffffffffffff00 XMM11=ffffffffffffffffffffffffffffff00 XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000 XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000 note that : RDI = R12 = the destination of the rep movsb. RSI = R15 = the source of the rep movsb RCX = 0 (it was 16), meaning that we did our reps. Here's the destination hexdump: (qemu) x /32wx 0xffff80013eb5ada0 ffff80013eb5ada0: 0xffbfef50 0x00007f7f 0x00401a86 0x00000000 ffff80013eb5adb0: 0x3eb5ade0 0xffff8001 0x00000000 0x00000000 ffff80013eb5adc0: 0x00000000 0x00000000 0x3eb5af40 0xffff8001 ffff80013eb5add0: 0xc6c8b7c0 0xffffffff 0x00401990 0x00000000 ffff80013eb5ade0: 0x3eb5aeb0 0xffff8001 0xc200d341 0xffffffff ffff80013eb5adf0: 0x00402f59 0x00000000 0x00000000 0x00000000 ffff80013eb5ae00: 0x80000001 0x00000000 0x3eb5af40 0xffff8001 ffff80013eb5ae10: 0x000003d4 0x00000000 0x00003ab1 0x00000000 Here's the source hexdump: (qemu) x /32wx 0x00007f7fffbfef50 00007f7fffbfef50: 0x00000000 0x00000000 0x004020c1 0x00000000 00007f7fffbfef60: 0xffbfef68 0x00007f7f 0x0000001c 0x00000000 00007f7fffbfef70: 0x00000001 0x00000000 0xffbfefe8 0x00007f7f 00007f7fffbfef80: 0x00000000 0x00000000 0xffbfeff5 0x00007f7f 00007f7fffbfef90: 0x00000000 0x00000000 0x00000003 0x00000000 00007f7fffbfefa0: 0x00400040 0x00000000 0x00000004 0x00000000 00007f7fffbfefb0: 0x00000020 0x00000000 0x00000005 0x00000000 00007f7fffbfefc0: 0x00000009 0x00000000 0x00000009 0x00000000 the first 16 bytes should be the same. had we actually copied the src into the dst, then the program (backtrace) would work. but it looks like we just silently ignored it. note this is a hacked up copy_from_user(), where it is just a __user_memcpy(), which is what happens when you do a count of say 16 bytes (in this case). you can see from r8 == 0 that there was no error. i also had a printk in the try_exception_fixup just in case. does anyone know of a reason why rep movsb might not work? it sounds crazy. (i also tested on hardware, and it seems to do the same, though i can't inspect the state easily). likewise, there's probably something i'm doing wrong. also note that this code runs from IRQ context (CTRL-B backspace). a super-nasty commit with all of my debugging crap is at origin/nasty-bug if anyone wants to take a look. (i also do an ash ifconfig and epoll_server, which is just some crap to get the user to spin somewhere with a bit of a stack). barret -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
