Great, thanks for the explanations and info! On Wed, Aug 14, 2019 at 4:32 PM Gabe Black <[email protected]> wrote:
> We might have some support for xsave but I wasn't able to find it. We do > decode it, but I'm pretty sure we return a WarnUnimplemented instruction. I > was confusing XSAVE with FXSAVE before, where FXSAVE is part of SSE and > which we do at least partially support. I don't think we have support for > XSAVE which is another thing which saves a bunch of processor state as > selected with a mask and I think an XCR0 register (which we also don't > support) with variable sizes, etc, etc. Someone could add support for that, > but it sounds like a lot of work. > > Gabe > > On Wed, Aug 14, 2019 at 3:55 PM Pouya Fotouhi <[email protected]> > wrote: > >> I'm still digesting some of your points, but in general, something I >> noticed in "newer" kernels is the "hard-coded" assumption for some of these >> (specially security related) features (take SMAP as an example). So, to my >> understanding, if our CPUID simply says "I don't know", in some cases >> kernel interprets that as a yes rather than a no! So, again to my limited >> knowledge, I think it'd best to respond negative until we have support for >> these features. >> >> Regarding xsave, if you recall discussions we had about change 19892 >> <https://gem5-review.googlesource.com/c/public/gem5/+/19892>, our CPUID >> returns 0x04000209 for 0_1. The most significant set bit we have is bit 26, >> which tells the kernel we do have support for xsave and then kernel tries >> to set bit 18 on CR4. Correct me if I'm wrong, but my understanding was >> that we have "some" support for xsave in gem5. Although looking at my >> kernel logs, kernel seem to disable it after some tests during SMP boot >> process (probably our support is not enough for kernel and it masks it off). >> >> Best, >> >> On Wed, Aug 14, 2019 at 3:28 PM Gabe Black <[email protected]> wrote: >> >>> Actually it looks like somebody added a new function while skipping over >>> the ones below. That's how the unimplemented functions slipped through. I'm >>> not going to try to implement those for now, but I don't want to discourage >>> anyone that wants to do something with them. >>> >>> I'm also looking into why the kernel thinks we support xsave (which >>> seems to be fairly complicated) when we do not. I think there's just an >>> extra bit set in CPUID I need to turn off. >>> >>> Gabe >>> >>> On Wed, Aug 14, 2019 at 3:01 PM Gabe Black <[email protected]> wrote: >>> >>>> I was actually just looking this since I noticed that one of the x86 >>>> kernels I have lying around was crashing with an undefined opcode >>>> exception. I see that the doCpuid function will just bail out for some of >>>> the functions which are below the largest it supports (so it can support >>>> the extended functions). The CPUID instruction will just leave EAX, EBX, >>>> ECX and EDX unmodified in this case since it isn't supposed to raise any >>>> type of fault. The kernel will try to interpret those fields as an actual >>>> answer since we told it those functions were supported, and depending on >>>> what executed before it do something arbitrary. We should definitely stop >>>> doing that for starters. I think this is something I partially implemented >>>> since it was blocking boot a long time ago, and then never went back and >>>> filled out. For some of these functions we may not have good answers, for >>>> instance where reporting cache sizes. I'm not sure what to do in that case. >>>> We may need to look at those fields one by one and try to come up with >>>> safe, fairly inert answers. If we can return something that says "I don't >>>> know", that would be best. >>>> >>>> The specific case I'm looking at is function 0xd though, which we would >>>> have told the kernel we don't support. That's also passing through its >>>> values which is also giving bad answers. >>>> >>>> I'll put up some CLs which fill out function constants we don't yet >>>> have, return 0 when we don't get an answer from doCpuid, and start looking >>>> at what the unimplemented functions should return. We can build on that to >>>> add in functions that are missing so the kernel at least stops tripping >>>> over itself when it gets nonsensical answers from CPUID. >>>> >>>> Gabe >>>> >>>> On Wed, Aug 14, 2019 at 2:01 PM Pouya Fotouhi <[email protected]> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> During kernel boot up with the timing/atomic/O3 CPU modes I get the >>>>> following kernel oops at native_flush_tlb_global. Looking closer at the >>>>> issue, Exec traces show: >>>>> >>>>> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96 : mov >>>>> eax, 0x2 >>>>> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96.0 : >>>>> MOV_R_I : limm eax, 0x2 : IntAlu : D=0x0000000000000002 >>>>> flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop) >>>>> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101 : ud2 >>>>> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101.0 : UD2 >>>>> : fault Invalid-Opcode : No_OpClass : >>>>> flags=(IsMicroop|IsLastMicroop|IsFirstMicroop) >>>>> 2014094500: system.cpu A0 T0 : @native_flush_tlb_global+101.32768 : >>>>> Microcode_ROM : slli t4, t1, 0x4 : IntAlu : D=0x0000000000000060 >>>>> flags=(IsInteger|IsMicroop|IsDelayedCommit) >>>>> >>>>> Looking at the decode of the "undefined" instruction raising the >>>>> fault: >>>>> 2014094250: system.cpu: Decode: Decoded fault instruction: >>>>> { >>>>> leg = 0x10, >>>>> rex = 0, >>>>> vex/xop = 0, >>>>> op = { >>>>> type = three byte 0f38, >>>>> op = 0x82, >>>>> }, >>>>> modRM = 0, >>>>> sib = 0, >>>>> immediate = 0, >>>>> displacement = 0 >>>>> dispSize = 0} >>>>> >>>>> Which apparently is invpcid, and dump of native_flush_tlb_global >>>>> confirms: >>>>> >>>>> 0xffffffff81033a68 <+96>: mov $0x2,%eax >>>>> 0xffffffff81033a6d <+101>: invpcid (%rcx),%rax >>>>> 0xffffffff81033a72 <+106>: add $0x18,%rsp >>>>> >>>>> We do not implement this instruction, and It seems like this >>>>> functionality is reported in function 0_7 of CPUID (which we do not >>>>> implement). >>>>> >>>>> I also have a different, yet related, issue with SMAP and FSGSBASE >>>>> bits (bits 20 and 16 in CR4), where kernel tries to set those resulting in >>>>> a fault which our CPUs can't handle and kernel panics upon them. These >>>>> functionalities are also reported by function 0_7 in CPUID which we do not >>>>> implement >>>>> >>>>> I was wondering if it would be safe to simply return 0s for function >>>>> 0_7? I checked, and I couldn't find anything violating the functionalities >>>>> we support in gem5. However, I would appreciate if someone more familiar >>>>> with our support for x86 can double check >>>>> https://www.sandpile.org/x86/cpuid.htm#level_0000_0007h and verify >>>>> that returning 0s would be fine here. >>>>> >>>>> For the corner case my kernel was hitting, I tested and returning 0s >>>>> would get me past both these issues. Upon confirmation from someone in the >>>>> community, I can proceed and submit the change. >>>>> >>>>> Best, >>>>> -- >>>>> Pouya Fotouhi >>>>> PhD Candidate >>>>> Department of Electrical and Computer Engineering >>>>> University of California, Davis >>>>> >>>> >> >> -- >> Pouya Fotouhi >> PhD Candidate >> Department of Electrical and Computer Engineering >> University of California, Davis >> > -- Pouya Fotouhi PhD Candidate Department of Electrical and Computer Engineering University of California, Davis
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
