Great, thanks for the explanations and info!

On Wed, Aug 14, 2019 at 4:32 PM Gabe Black <[email protected]> wrote:

> We might have some support for xsave but I wasn't able to find it. We do
> decode it, but I'm pretty sure we return a WarnUnimplemented instruction. I
> was confusing XSAVE with FXSAVE before, where FXSAVE is part of SSE and
> which we do at least partially support. I don't think we have support for
> XSAVE which is another thing which saves a bunch of processor state as
> selected with a mask and I think an XCR0 register (which we also don't
> support) with variable sizes, etc, etc. Someone could add support for that,
> but it sounds like a lot of work.
>
> Gabe
>
> On Wed, Aug 14, 2019 at 3:55 PM Pouya Fotouhi <[email protected]>
> wrote:
>
>> I'm still digesting some of your points, but in general, something I
>> noticed in "newer" kernels is the "hard-coded" assumption for some of these
>> (specially security related) features (take SMAP as an example). So, to my
>> understanding, if our CPUID simply says "I don't know", in some cases
>> kernel interprets that as a yes rather than a no! So, again to my limited
>> knowledge, I think it'd best to respond negative until we have support for
>> these features.
>>
>> Regarding xsave, if you recall discussions we had about change 19892
>> <https://gem5-review.googlesource.com/c/public/gem5/+/19892>, our CPUID
>> returns 0x04000209 for 0_1. The most significant set bit we have is bit 26,
>> which tells the kernel we do have support for xsave and then kernel tries
>> to set bit 18 on CR4. Correct me if I'm wrong, but my understanding was
>> that we have "some" support for xsave in gem5. Although looking at my
>> kernel logs, kernel seem to disable it after some tests during SMP boot
>> process (probably our support is not enough for kernel and it masks it off).
>>
>> Best,
>>
>> On Wed, Aug 14, 2019 at 3:28 PM Gabe Black <[email protected]> wrote:
>>
>>> Actually it looks like somebody added a new function while skipping over
>>> the ones below. That's how the unimplemented functions slipped through. I'm
>>> not going to try to implement those for now, but I don't want to discourage
>>> anyone that wants to do something with them.
>>>
>>> I'm also looking into why the kernel thinks we support xsave (which
>>> seems to be fairly complicated) when we do not. I think there's just an
>>> extra bit set in CPUID I need to turn off.
>>>
>>> Gabe
>>>
>>> On Wed, Aug 14, 2019 at 3:01 PM Gabe Black <[email protected]> wrote:
>>>
>>>> I was actually just looking this since I noticed that one of the x86
>>>> kernels I have lying around was crashing with an undefined opcode
>>>> exception. I see that the doCpuid function will just bail out for some of
>>>> the functions which are below the largest it supports (so it can support
>>>> the extended functions). The CPUID instruction will just leave EAX, EBX,
>>>> ECX and EDX unmodified in this case since it isn't supposed to raise any
>>>> type of fault. The kernel will try to interpret those fields as an actual
>>>> answer since we told it those functions were supported, and depending on
>>>> what executed before it do something arbitrary. We should definitely stop
>>>> doing that for starters. I think this is something I partially implemented
>>>> since it was blocking boot a long time ago, and then never went back and
>>>> filled out. For some of these functions we may not have good answers, for
>>>> instance where reporting cache sizes. I'm not sure what to do in that case.
>>>> We may need to look at those fields one by one and try to come up with
>>>> safe, fairly inert answers. If we can return something that says "I don't
>>>> know", that would be best.
>>>>
>>>> The specific case I'm looking at is function 0xd though, which we would
>>>> have told the kernel we don't support. That's also passing through its
>>>> values which is also giving bad answers.
>>>>
>>>> I'll put up some CLs which fill out function constants we don't yet
>>>> have, return 0 when we don't get an answer from doCpuid, and start looking
>>>> at what the unimplemented functions should return. We can build on that to
>>>> add in functions that are missing so the kernel at least stops tripping
>>>> over itself when it gets nonsensical answers from CPUID.
>>>>
>>>> Gabe
>>>>
>>>> On Wed, Aug 14, 2019 at 2:01 PM Pouya Fotouhi <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> During kernel boot up with the timing/atomic/O3 CPU modes I get the
>>>>> following kernel oops at native_flush_tlb_global. Looking closer at the
>>>>> issue, Exec traces show:
>>>>>
>>>>> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96    : mov
>>>>>   eax, 0x2
>>>>> 2014093750: system.cpu A0 T0 : @native_flush_tlb_global+96.0  :
>>>>> MOV_R_I : limm   eax, 0x2 : IntAlu :  D=0x0000000000000002
>>>>>  flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop)
>>>>> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101    : ud2
>>>>> 2014094250: system.cpu A0 T0 : @native_flush_tlb_global+101.0  :   UD2
>>>>> : fault   Invalid-Opcode : No_OpClass :
>>>>> flags=(IsMicroop|IsLastMicroop|IsFirstMicroop)
>>>>> 2014094500: system.cpu A0 T0 : @native_flush_tlb_global+101.32768 :
>>>>> Microcode_ROM : slli   t4, t1, 0x4 : IntAlu :  D=0x0000000000000060
>>>>>  flags=(IsInteger|IsMicroop|IsDelayedCommit)
>>>>>
>>>>> Looking at  the decode of the "undefined" instruction raising the
>>>>> fault:
>>>>> 2014094250: system.cpu: Decode: Decoded fault instruction:
>>>>> {
>>>>>         leg = 0x10,
>>>>>         rex = 0,
>>>>>         vex/xop = 0,
>>>>>         op = {
>>>>>                 type = three byte 0f38,
>>>>>                 op = 0x82,
>>>>>                 },
>>>>>         modRM = 0,
>>>>>         sib = 0,
>>>>>         immediate = 0,
>>>>>         displacement = 0
>>>>>         dispSize = 0}
>>>>>
>>>>> Which apparently is  invpcid, and dump of native_flush_tlb_global
>>>>> confirms:
>>>>>
>>>>>    0xffffffff81033a68 <+96>:    mov    $0x2,%eax
>>>>>    0xffffffff81033a6d <+101>:   invpcid (%rcx),%rax
>>>>>    0xffffffff81033a72 <+106>:   add    $0x18,%rsp
>>>>>
>>>>> We do not implement this instruction, and It seems like this
>>>>> functionality is reported in function 0_7 of CPUID (which we do not
>>>>> implement).
>>>>>
>>>>> I also have a different, yet related, issue with SMAP and FSGSBASE
>>>>> bits (bits 20 and 16 in CR4), where kernel tries to set those resulting in
>>>>> a fault which our CPUs can't handle and kernel panics upon them. These
>>>>> functionalities are also reported by function 0_7 in CPUID which we do not
>>>>> implement
>>>>>
>>>>> I was wondering if it would be safe to simply return 0s for function
>>>>> 0_7? I checked, and I couldn't find anything violating the functionalities
>>>>> we support in gem5. However, I would appreciate if someone more familiar
>>>>> with our support for x86 can double check
>>>>> https://www.sandpile.org/x86/cpuid.htm#level_0000_0007h and verify
>>>>> that returning 0s would be fine here.
>>>>>
>>>>> For the corner case my kernel was hitting, I tested and returning 0s
>>>>> would get me past both these issues. Upon confirmation from someone in the
>>>>> community, I can proceed and submit the change.
>>>>>
>>>>> Best,
>>>>> --
>>>>> Pouya Fotouhi
>>>>> PhD Candidate
>>>>> Department of Electrical and Computer Engineering
>>>>> University of California, Davis
>>>>>
>>>>
>>
>> --
>> Pouya Fotouhi
>> PhD Candidate
>> Department of Electrical and Computer Engineering
>> University of California, Davis
>>
>

-- 
Pouya Fotouhi
PhD Candidate
Department of Electrical and Computer Engineering
University of California, Davis
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to