Hi Jerin, > -----Original Message----- > From: Jerin Jacob Kollanukkaran <[email protected]> > Sent: Thursday, October 31, 2019 2:02 AM > To: [email protected] > Cc: Olivier Matz <[email protected]>; Andrew Rybchenko > <[email protected]>; David Christensen <[email protected]>; > [email protected]; [email protected]; > [email protected]; Shahaf Shuler <[email protected]>; > Honnappa Nagarahalli <[email protected]>; Gavin Hu (Arm > Technology China) <[email protected]>; [email protected]; > [email protected] > Subject: Mbuf memory alignment constraints for (micro)architectures > > CC: Arch and platform maintainers > > While reviewing the mempool objection allocation requirements in the code, > > A) it's found that in the default case, mempool objects have padding > in the object trailer to have start addresses of objects among the different > channels, > to enable equally load on the DRAM channel to have better performance > > # More documentation is here > https://doc.dpdk.org/guides/prog_guide/mempool_lib.html > in section 8.3. Memory Alignment Constraints > > B) The optimize_object_size() does the channel distribution requirement > by the following formula > > new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) / > RTE_MEMPOOL_ALIGN; > while (get_gcd(new_obj_size, nrank * nchan) != 1) > new_obj_size++; > > > C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2 > SoC > The memory/DDR controller works in different way. Where by: > # It does XOR operation of some of physical address lines(not the user space > VA address) > to compute the hash and that the function defines the actual channel. > > The XOR(kind of CRC) scheme is useful because there is natural channel > distribution > based on the address i.e No need to have padding to waste memory > > So, in short the padding scheme does not need for some SoC. I trying to send > the patch > to fix it. So the questions is, > > # Is PPC and other ARM SoC has formula (B) to compute DRAM channel > distribution ? or > Is it specific to x86? That would define where the hooks needs to added to > have > proper fix. Reading through some documents, both x86 and arm, and having internal discussion, it looks like this is specific to x86, x86 spreads adjacent virtual addresses within a page across multiple memory devices, the interleaving was done per one or two cache lines. https://software.intel.com/en-us/articles/how-memory-is-accessed
Arm leaves flexibility to implementations, no fixed pattern for interleaving and thus it can hardly be generalized. /Gavin > > > > > > > >

