Actually it looks like somebody added a new function while skipping over the ones below. That's how the unimplemented functions slipped through. I'm not going to try to implement those for now, but I don't want to discourage anyone that wants to do something with them.
I'm also looking into why the kernel thinks we support xsave (which seems to be fairly complicated) when we do not. I think there's just an extra bit set in CPUID I need to turn off. Gabe On Wed, Aug 14, 2019 at 3:01 PM Gabe Black <[email protected]> wrote: > I was actually just looking this since I noticed that one of the x86 > kernels I have lying around was crashing with an undefined opcode > exception. I see that the doCpuid function will just bail out for some of > the functions which are below the largest it supports (so it can support > the extended functions). The CPUID instruction will just leave EAX, EBX, > ECX and EDX unmodified in this case since it isn't supposed to raise any > type of fault. The kernel will try to interpret those fields as an actual > answer since we told it those functions were supported, and depending on > what executed before it do something arbitrary. We should definitely stop > doing that for starters. I think this is something I partially implemented > since it was blocking boot a long time ago, and then never went back and > filled out. For some of these functions we may not have good answers, for > instance where reporting cache sizes. I'm not sure what to do in that case. > We may need to look at those fields one by one and try to come up with > safe, fairly inert answers. If we can return something that says "I don't > know", that would be best. > > The specific case I'm looking at is function 0xd though, which we would > have told the kernel we don't support. That's also passing through its > values which is also giving bad answers. > > I'll put up some CLs which fill out function constants we don't yet have, > return 0 when we don't get an answer from doCpuid, and start looking at > what the unimplemented functions should return. We can build on that to add > in functions that are missing so the kernel at least stops tripping over > itself when it gets nonsensical answers from CPUID. > > Gabe > > On Wed, Aug 14, 2019 at 2:01 PM Pouya Fotouhi <[email protected]> > wrote: > >> Hi All, >> >> During kernel boot up with the timing/atomic/O3 CPU modes I get the >> following kernel oops at native_flush_tlb_global. Looking closer at the >> issue, Exec traces show: >> >> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96 : mov >> eax, 0x2 >> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96.0 : MOV_R_I >> : limm eax, 0x2 : IntAlu : D=0x0000000000000002 >> flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop) >> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101 : ud2 >> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101.0 : UD2 : >> fault Invalid-Opcode : No_OpClass : >> flags=(IsMicroop|IsLastMicroop|IsFirstMicroop) >> 2014094500: system.cpu A0 T0 : @native_flush_tlb_global+101.32768 : >> Microcode_ROM : slli t4, t1, 0x4 : IntAlu : D=0x0000000000000060 >> flags=(IsInteger|IsMicroop|IsDelayedCommit) >> >> Looking at the decode of the "undefined" instruction raising the fault: >> 2014094250: system.cpu: Decode: Decoded fault instruction: >> { >> leg = 0x10, >> rex = 0, >> vex/xop = 0, >> op = { >> type = three byte 0f38, >> op = 0x82, >> }, >> modRM = 0, >> sib = 0, >> immediate = 0, >> displacement = 0 >> dispSize = 0} >> >> Which apparently is invpcid, and dump of native_flush_tlb_global >> confirms: >> >> 0xffffffff81033a68 <+96>: mov $0x2,%eax >> 0xffffffff81033a6d <+101>: invpcid (%rcx),%rax >> 0xffffffff81033a72 <+106>: add $0x18,%rsp >> >> We do not implement this instruction, and It seems like this >> functionality is reported in function 0_7 of CPUID (which we do not >> implement). >> >> I also have a different, yet related, issue with SMAP and FSGSBASE bits >> (bits 20 and 16 in CR4), where kernel tries to set those resulting in a >> fault which our CPUs can't handle and kernel panics upon them. These >> functionalities are also reported by function 0_7 in CPUID which we do not >> implement >> >> I was wondering if it would be safe to simply return 0s for function 0_7? >> I checked, and I couldn't find anything violating the functionalities we >> support in gem5. However, I would appreciate if someone more familiar with >> our support for x86 can double check >> https://www.sandpile.org/x86/cpuid.htm#level_0000_0007h and verify that >> returning 0s would be fine here. >> >> For the corner case my kernel was hitting, I tested and returning 0s >> would get me past both these issues. Upon confirmation from someone in the >> community, I can proceed and submit the change. >> >> Best, >> -- >> Pouya Fotouhi >> PhD Candidate >> Department of Electrical and Computer Engineering >> University of California, Davis >> >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
