Sorry, one more thing I missed pointing out: vmx_setup_constant_host_state() reads MSR_IA32_CR_PAT. I am assuming this happens way after early initialization, but not sure what the effect of changing the PAT is on this logic.
On Thu, Mar 10, 2016 at 12:21 PM, Kanoj Sarcar <[email protected]> wrote: > > > On Thu, Mar 10, 2016 at 11:56 AM, Barret Rhoden <[email protected]> > wrote: > >> On 2016-03-10 at 11:28 "'Kanoj Sarcar' via Akaros" >> <[email protected]> wrote: >> > >> > MLX uses WC to write data payload into HCA buffers. The performance >> > boost over UC/UC- is significant. PAT details are added as comments >> > above pat_init() routine. >> >> Nice find! >> >> > + >> > +void pat_init(void) >> > +{ >> > + unsigned long patval; >> > + >> > + patval = read_msr(MSR_IA32_CR_PAT); >> > + printk("PAT : origvalue = 0x%llx\n", patval); >> > + >> > + /* Clear PAT1/5 */ >> > + patval &= 0xFFFF00FFFFFF00FFULL; >> > + >> > + /* Set PAT1/5 to WC */ >> > + patval |= (1 << 8); >> > + patval |= (1ULL << 40); >> > + >> > + printk("PAT : newvalue = 0x%llx\n", patval); >> > + write_msr(MSR_IA32_CR_PAT, patval); >> >> This is probably dangerous. Does the PAT MSR apply to all cores, or >> just the calling core? I think it might be just the calling core, and >> it's probably a bad idea to have the cores disagree on the PAT >> settings. >> > > Agreed. Note that I created pat_init() as a routine, that can migrate into > arch code and be > invoked from the correct spot. I suggest we integrate the patch as is (its > just me that is > probably turning on this support anyway :-)), then transport the > pat_init() call to a better > location. See below: > > From section 11.12.4 in SDM: > The operating system is responsible for insuring that changes to a PAT > entry occur in a manner that maintains the > consistency of the processor caches and translation lookaside buffers > (TLB). This is accomplished by following the > procedure as specified in Section 11.11.8, “MTRR Considerations in MP > Systems,” for changing the value of an > MTRR in a multiple processor system. It requires a specific sequence of > operations that includes flushing the > processors caches and TLBs. > > 11.11.8 MTRR Considerations in MP Systems > In MP (multiple-processor) systems, the operating systems must maintain > MTRR consistency between all the > processors in the system. The Pentium 4, Intel Xeon, and P6 family > processors provide no hardware support to > maintain this consistency. In general, all processors must have the same > MTRR values. > > So two things: all processors must agree on the setting, and cache/TLB > flush must happen afterwards. > > >> It makes sense that we need a WC setting in PAT, and we don't have one >> at boot-up. How about we do the PAT setup in __arch_pcpu_init()? >> (using a helper like the one you've got sounds fine). I'd rather have >> this taken care of in the main OS, since its something we need >> regardless of the Linux drivers. >> > > Agreed. I don't know whether __arch_pcpu_init() obeys the "flush $/tlb" > afterwards rule ... > > >> Also, do we need both PAT1 and PAT5 set to WT? Could we leave PAT 1 as >> WT (so if you select PTE bits PTE_PWT you get WT) and set PAT 5 to WC >> (bits PTE_PAT|PTE_PWT). I'm not particularly tied to any of the PAT >> locations/numbers; I'm just trying to keep WT as an option in our PAT >> table. >> > > I am just trying to stay as close to Linux as possible. Who knows what > problems can show > up with PAT1 != PAT5 or having WT and WC together. Or actually having > PAT=1 in PTE (which > Linux has a comment about avoiding from the Intel guys in > arch/x86/mm/pat.c:pat_init()) : > > /* Set PWT to Write-Combining. All other bits stay the same */ > /* > * PTE encoding used in Linux: > * PAT > * |PCD > * ||PWT > * ||| > * 000 WB _PAGE_CACHE_WB > * 001 WC _PAGE_CACHE_WC > * 010 UC- _PAGE_CACHE_UC_MINUS > * 011 UC _PAGE_CACHE_UC > * PAT bit unused > */ > > >> > +static unsigned long pgprot_val(int vmprot) >> > +{ >> > + unsigned long prot = PTE_P | PTE_U | PTE_A; >> > + >> > + if (vmprot & PROT_WRITE) >> > + prot |= PTE_W | PTE_D; >> >> Just curious, why are PTE_A and PTE_D set? >> >> >> > I wanted to avoid the cpu coming and setting those bits. I am not sure > whether Linux > does the same for the user level uncached mapping, I could check if you > think there's > an issue. > > >> It sounds like the main kernel would benefit from another helper for >> mapping. We can add something like: >> >> uintptr_t vmap_pmem_writecomb(uintptr_t paddr, size_t nr_bytes); >> >> which will map [paddr, paddr + bytes) with the right flags to select >> the right PAT, and return the kernel address of the mapping. If you >> need this for a user address, then we'll need a bit more work. >> > > I believe Linux has something similar (Documentation/x86/pat.txt), but > yes, I am only > interested in the user address stuff for now. > > So, let me know what you want. Stage this patch in, then move pat_init() > somewhere > else, or you want it all in one go. > > >> >> Barret >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Akaros" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Thanks, > > Kanoj > -- Thanks, Kanoj -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
