On Thu, Mar 10, 2016 at 11:56 AM, Barret Rhoden <[email protected]> wrote:
> On 2016-03-10 at 11:28 "'Kanoj Sarcar' via Akaros" > <[email protected]> wrote: > > > > MLX uses WC to write data payload into HCA buffers. The performance > > boost over UC/UC- is significant. PAT details are added as comments > > above pat_init() routine. > > Nice find! > > > + > > +void pat_init(void) > > +{ > > + unsigned long patval; > > + > > + patval = read_msr(MSR_IA32_CR_PAT); > > + printk("PAT : origvalue = 0x%llx\n", patval); > > + > > + /* Clear PAT1/5 */ > > + patval &= 0xFFFF00FFFFFF00FFULL; > > + > > + /* Set PAT1/5 to WC */ > > + patval |= (1 << 8); > > + patval |= (1ULL << 40); > > + > > + printk("PAT : newvalue = 0x%llx\n", patval); > > + write_msr(MSR_IA32_CR_PAT, patval); > > This is probably dangerous. Does the PAT MSR apply to all cores, or > just the calling core? I think it might be just the calling core, and > it's probably a bad idea to have the cores disagree on the PAT > settings. > Agreed. Note that I created pat_init() as a routine, that can migrate into arch code and be invoked from the correct spot. I suggest we integrate the patch as is (its just me that is probably turning on this support anyway :-)), then transport the pat_init() call to a better location. See below: >From section 11.12.4 in SDM: The operating system is responsible for insuring that changes to a PAT entry occur in a manner that maintains the consistency of the processor caches and translation lookaside buffers (TLB). This is accomplished by following the procedure as specified in Section 11.11.8, “MTRR Considerations in MP Systems,” for changing the value of an MTRR in a multiple processor system. It requires a specific sequence of operations that includes flushing the processors caches and TLBs. 11.11.8 MTRR Considerations in MP Systems In MP (multiple-processor) systems, the operating systems must maintain MTRR consistency between all the processors in the system. The Pentium 4, Intel Xeon, and P6 family processors provide no hardware support to maintain this consistency. In general, all processors must have the same MTRR values. So two things: all processors must agree on the setting, and cache/TLB flush must happen afterwards. > It makes sense that we need a WC setting in PAT, and we don't have one > at boot-up. How about we do the PAT setup in __arch_pcpu_init()? > (using a helper like the one you've got sounds fine). I'd rather have > this taken care of in the main OS, since its something we need > regardless of the Linux drivers. > Agreed. I don't know whether __arch_pcpu_init() obeys the "flush $/tlb" afterwards rule ... > Also, do we need both PAT1 and PAT5 set to WT? Could we leave PAT 1 as > WT (so if you select PTE bits PTE_PWT you get WT) and set PAT 5 to WC > (bits PTE_PAT|PTE_PWT). I'm not particularly tied to any of the PAT > locations/numbers; I'm just trying to keep WT as an option in our PAT > table. > I am just trying to stay as close to Linux as possible. Who knows what problems can show up with PAT1 != PAT5 or having WT and WC together. Or actually having PAT=1 in PTE (which Linux has a comment about avoiding from the Intel guys in arch/x86/mm/pat.c:pat_init()) : /* Set PWT to Write-Combining. All other bits stay the same */ /* * PTE encoding used in Linux: * PAT * |PCD * ||PWT * ||| * 000 WB _PAGE_CACHE_WB * 001 WC _PAGE_CACHE_WC * 010 UC- _PAGE_CACHE_UC_MINUS * 011 UC _PAGE_CACHE_UC * PAT bit unused */ > > +static unsigned long pgprot_val(int vmprot) > > +{ > > + unsigned long prot = PTE_P | PTE_U | PTE_A; > > + > > + if (vmprot & PROT_WRITE) > > + prot |= PTE_W | PTE_D; > > Just curious, why are PTE_A and PTE_D set? > > > I wanted to avoid the cpu coming and setting those bits. I am not sure whether Linux does the same for the user level uncached mapping, I could check if you think there's an issue. > It sounds like the main kernel would benefit from another helper for > mapping. We can add something like: > > uintptr_t vmap_pmem_writecomb(uintptr_t paddr, size_t nr_bytes); > > which will map [paddr, paddr + bytes) with the right flags to select > the right PAT, and return the kernel address of the mapping. If you > need this for a user address, then we'll need a bit more work. > I believe Linux has something similar (Documentation/x86/pat.txt), but yes, I am only interested in the user address stuff for now. So, let me know what you want. Stage this patch in, then move pat_init() somewhere else, or you want it all in one go. > > Barret > > -- > You received this message because you are subscribed to the Google Groups > "Akaros" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Thanks, Kanoj -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
