Sorry, one more thing I missed pointing out:

vmx_setup_constant_host_state() reads MSR_IA32_CR_PAT. I am assuming this
happens way after early initialization, but not sure what the effect of
changing the
PAT is on this logic.


On Thu, Mar 10, 2016 at 12:21 PM, Kanoj Sarcar <[email protected]> wrote:

>
>
> On Thu, Mar 10, 2016 at 11:56 AM, Barret Rhoden <[email protected]>
> wrote:
>
>> On 2016-03-10 at 11:28 "'Kanoj Sarcar' via Akaros"
>> <[email protected]> wrote:
>> >
>> > MLX uses WC to write data payload into HCA buffers. The performance
>> > boost over UC/UC- is significant. PAT details are added as comments
>> > above pat_init() routine.
>>
>> Nice find!
>>
>> > +
>> > +void pat_init(void)
>> > +{
>> > +     unsigned long patval;
>> > +
>> > +     patval = read_msr(MSR_IA32_CR_PAT);
>> > +     printk("PAT : origvalue = 0x%llx\n", patval);
>> > +
>> > +     /* Clear PAT1/5 */
>> > +     patval &= 0xFFFF00FFFFFF00FFULL;
>> > +
>> > +     /* Set PAT1/5 to WC */
>> > +     patval |= (1 << 8);
>> > +     patval |= (1ULL << 40);
>> > +
>> > +     printk("PAT : newvalue = 0x%llx\n", patval);
>> > +     write_msr(MSR_IA32_CR_PAT, patval);
>>
>> This is probably dangerous.  Does the PAT MSR apply to all cores, or
>> just the calling core?  I think it might be just the calling core, and
>> it's probably a bad idea to have the cores disagree on the PAT
>> settings.
>>
>
> Agreed. Note that I created pat_init() as a routine, that can migrate into
> arch code and be
> invoked from the correct spot. I suggest we integrate the patch as is (its
> just me that is
> probably turning on this support anyway :-)), then transport the
> pat_init() call to a better
> location. See below:
>
> From section 11.12.4 in SDM:
> The operating system is responsible for insuring that changes to a PAT
> entry occur in a manner that maintains the
> consistency of the processor caches and translation lookaside buffers
> (TLB). This is accomplished by following the
> procedure as specified in Section 11.11.8, “MTRR Considerations in MP
> Systems,” for changing the value of an
> MTRR in a multiple processor system. It requires a specific sequence of
> operations that includes flushing the
> processors caches and TLBs.
>
> 11.11.8 MTRR Considerations in MP Systems
> In MP (multiple-processor) systems, the operating systems must maintain
> MTRR consistency between all the
> processors in the system. The Pentium 4, Intel Xeon, and P6 family
> processors provide no hardware support to
> maintain this consistency. In general, all processors must have the same
> MTRR values.
>
> So two things: all processors must agree on the setting, and cache/TLB
> flush must happen afterwards.
>
>
>> It makes sense that we need a WC setting in PAT, and we don't have one
>> at boot-up.  How about we do the PAT setup in __arch_pcpu_init()?
>> (using a helper like the one you've got sounds fine).  I'd rather have
>> this taken care of in the main OS, since its something we need
>> regardless of the Linux drivers.
>>
>
> Agreed. I don't know whether   __arch_pcpu_init() obeys the "flush $/tlb"
> afterwards rule ...
>
>
>> Also, do we need both PAT1 and PAT5 set to WT?  Could we leave PAT 1 as
>> WT (so if you select PTE bits PTE_PWT you get WT) and set PAT 5 to WC
>> (bits PTE_PAT|PTE_PWT).  I'm not particularly tied to any of the PAT
>> locations/numbers; I'm just trying to keep WT as an option in our PAT
>> table.
>>
>
> I am just trying to stay as close to Linux as possible. Who knows what
> problems can show
> up with PAT1 != PAT5 or having WT and WC together. Or actually having
> PAT=1 in PTE (which
> Linux has a comment about avoiding from the Intel guys in
> arch/x86/mm/pat.c:pat_init()) :
>
>         /* Set PWT to Write-Combining. All other bits stay the same */
>         /*
>          * PTE encoding used in Linux:
>          *      PAT
>          *      |PCD
>          *      ||PWT
>          *      |||
>          *      000 WB          _PAGE_CACHE_WB
>          *      001 WC          _PAGE_CACHE_WC
>          *      010 UC-         _PAGE_CACHE_UC_MINUS
>          *      011 UC          _PAGE_CACHE_UC
>          * PAT bit unused
>          */
>
>
>> > +static unsigned long pgprot_val(int vmprot)
>> > +{
>> > +     unsigned long   prot = PTE_P | PTE_U | PTE_A;
>> > +
>> > +     if (vmprot & PROT_WRITE)
>> > +             prot |= PTE_W | PTE_D;
>>
>> Just curious, why are PTE_A and PTE_D set?
>>
>>
>>
> I wanted to avoid the cpu coming and setting those bits. I am not sure
> whether Linux
> does the same for the user level uncached mapping, I could check if you
> think there's
> an issue.
>
>
>> It sounds like the main kernel would benefit from another helper for
>> mapping.  We can add something like:
>>
>>         uintptr_t vmap_pmem_writecomb(uintptr_t paddr, size_t nr_bytes);
>>
>> which will map [paddr, paddr + bytes) with the right flags to select
>> the right PAT, and return the kernel address of the mapping.  If you
>> need this for a user address, then we'll need a bit more work.
>>
>
> I believe Linux has something similar (Documentation/x86/pat.txt), but
> yes, I am only
> interested in the user address stuff for now.
>
> So, let me know what you want. Stage this patch in, then move pat_init()
> somewhere
> else, or you want it all in one go.
>
>
>>
>> Barret
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Akaros" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Thanks,
>
> Kanoj
>



-- 
Thanks,

Kanoj

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to