Hi Jerin, > -----Original Message----- > From: Jerin Jacob Kollanukkaran <jer...@marvell.com> > Sent: Thursday, October 31, 2019 2:02 AM > To: dev@dpdk.org > Cc: Olivier Matz <olivier.m...@6wind.com>; Andrew Rybchenko > <arybche...@solarflare.com>; David Christensen <d...@linux.vnet.ibm.com>; > bruce.richard...@intel.com; konstantin.anan...@intel.com; > hemant.agra...@nxp.com; Shahaf Shuler <shah...@mellanox.com>; > Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm > Technology China) <gavin...@arm.com>; vikto...@rehivetech.com; > anatoly.bura...@intel.com > Subject: Mbuf memory alignment constraints for (micro)architectures > > CC: Arch and platform maintainers > > While reviewing the mempool objection allocation requirements in the code, > > A) it's found that in the default case, mempool objects have padding > in the object trailer to have start addresses of objects among the different > channels, > to enable equally load on the DRAM channel to have better performance > > # More documentation is here > https://doc.dpdk.org/guides/prog_guide/mempool_lib.html > in section 8.3. Memory Alignment Constraints > > B) The optimize_object_size() does the channel distribution requirement > by the following formula > > new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) / > RTE_MEMPOOL_ALIGN; > while (get_gcd(new_obj_size, nrank * nchan) != 1) > new_obj_size++; > > > C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2 > SoC > The memory/DDR controller works in different way. Where by: > # It does XOR operation of some of physical address lines(not the user space > VA address) > to compute the hash and that the function defines the actual channel. > > The XOR(kind of CRC) scheme is useful because there is natural channel > distribution > based on the address i.e No need to have padding to waste memory > > So, in short the padding scheme does not need for some SoC. I trying to send > the patch > to fix it. So the questions is, > > # Is PPC and other ARM SoC has formula (B) to compute DRAM channel > distribution ? or > Is it specific to x86? That would define where the hooks needs to added to > have > proper fix. Reading through some documents, both x86 and arm, and having internal discussion, it looks like this is specific to x86, x86 spreads adjacent virtual addresses within a page across multiple memory devices, the interleaving was done per one or two cache lines. https://software.intel.com/en-us/articles/how-memory-is-accessed
Arm leaves flexibility to implementations, no fixed pattern for interleaving and thus it can hardly be generalized. /Gavin > > > > > > > >