Re: [akaros] [PATCH] Implement write combining on x86

'Kanoj Sarcar' via Akaros Thu, 10 Mar 2016 12:28:17 -0800

On Thu, Mar 10, 2016 at 11:56 AM, Barret Rhoden <[email protected]>
wrote:


> On 2016-03-10 at 11:28 "'Kanoj Sarcar' via Akaros"
> <[email protected]> wrote:
> >
> > MLX uses WC to write data payload into HCA buffers. The performance
> > boost over UC/UC- is significant. PAT details are added as comments
> > above pat_init() routine.
>
> Nice find!
>
> > +
> > +void pat_init(void)
> > +{
> > +     unsigned long patval;
> > +
> > +     patval = read_msr(MSR_IA32_CR_PAT);
> > +     printk("PAT : origvalue = 0x%llx\n", patval);
> > +
> > +     /* Clear PAT1/5 */
> > +     patval &= 0xFFFF00FFFFFF00FFULL;
> > +
> > +     /* Set PAT1/5 to WC */
> > +     patval |= (1 << 8);
> > +     patval |= (1ULL << 40);
> > +
> > +     printk("PAT : newvalue = 0x%llx\n", patval);
> > +     write_msr(MSR_IA32_CR_PAT, patval);
>
> This is probably dangerous.  Does the PAT MSR apply to all cores, or
> just the calling core?  I think it might be just the calling core, and
> it's probably a bad idea to have the cores disagree on the PAT
> settings.
>

Agreed. Note that I created pat_init() as a routine, that can migrate into
arch code and be
invoked from the correct spot. I suggest we integrate the patch as is (its
just me that is
probably turning on this support anyway :-)), then transport the pat_init()
call to a better
location. See below:

>From section 11.12.4 in SDM:
The operating system is responsible for insuring that changes to a PAT
entry occur in a manner that maintains the
consistency of the processor caches and translation lookaside buffers
(TLB). This is accomplished by following the
procedure as specified in Section 11.11.8, “MTRR Considerations in MP
Systems,” for changing the value of an
MTRR in a multiple processor system. It requires a specific sequence of
operations that includes flushing the
processors caches and TLBs.

11.11.8 MTRR Considerations in MP Systems
In MP (multiple-processor) systems, the operating systems must maintain
MTRR consistency between all the
processors in the system. The Pentium 4, Intel Xeon, and P6 family
processors provide no hardware support to
maintain this consistency. In general, all processors must have the same
MTRR values.

So two things: all processors must agree on the setting, and cache/TLB
flush must happen afterwards.


> It makes sense that we need a WC setting in PAT, and we don't have one
> at boot-up.  How about we do the PAT setup in __arch_pcpu_init()?
> (using a helper like the one you've got sounds fine).  I'd rather have
> this taken care of in the main OS, since its something we need
> regardless of the Linux drivers.
>

Agreed. I don't know whether   __arch_pcpu_init() obeys the "flush $/tlb"
afterwards rule ...


> Also, do we need both PAT1 and PAT5 set to WT?  Could we leave PAT 1 as
> WT (so if you select PTE bits PTE_PWT you get WT) and set PAT 5 to WC
> (bits PTE_PAT|PTE_PWT).  I'm not particularly tied to any of the PAT
> locations/numbers; I'm just trying to keep WT as an option in our PAT
> table.
>

I am just trying to stay as close to Linux as possible. Who knows what
problems can show
up with PAT1 != PAT5 or having WT and WC together. Or actually having PAT=1
in PTE (which
Linux has a comment about avoiding from the Intel guys in
arch/x86/mm/pat.c:pat_init()) :

        /* Set PWT to Write-Combining. All other bits stay the same */
        /*
         * PTE encoding used in Linux:
         *      PAT
         *      |PCD
         *      ||PWT
         *      |||
         *      000 WB          _PAGE_CACHE_WB
         *      001 WC          _PAGE_CACHE_WC
         *      010 UC-         _PAGE_CACHE_UC_MINUS
         *      011 UC          _PAGE_CACHE_UC
         * PAT bit unused
         */


> > +static unsigned long pgprot_val(int vmprot)
> > +{
> > +     unsigned long   prot = PTE_P | PTE_U | PTE_A;
> > +
> > +     if (vmprot & PROT_WRITE)
> > +             prot |= PTE_W | PTE_D;
>
> Just curious, why are PTE_A and PTE_D set?
>
>
>
I wanted to avoid the cpu coming and setting those bits. I am not sure
whether Linux
does the same for the user level uncached mapping, I could check if you
think there's
an issue.


> It sounds like the main kernel would benefit from another helper for
> mapping.  We can add something like:
>
>         uintptr_t vmap_pmem_writecomb(uintptr_t paddr, size_t nr_bytes);
>
> which will map [paddr, paddr + bytes) with the right flags to select
> the right PAT, and return the kernel address of the mapping.  If you
> need this for a user address, then we'll need a bit more work.
>

I believe Linux has something similar (Documentation/x86/pat.txt), but yes,
I am only
interested in the user address stuff for now.

So, let me know what you want. Stage this patch in, then move pat_init()
somewhere
else, or you want it all in one go.


>
> Barret
>
> --
> You received this message because you are subscribed to the Google Groups
> "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Thanks,

Kanoj

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] [PATCH] Implement write combining on x86

Reply via email to