Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jer...@marvell.com>
> Sent: Thursday, October 31, 2019 2:02 AM
> To: dev@dpdk.org
> Cc: Olivier Matz <olivier.m...@6wind.com>; Andrew Rybchenko
> <arybche...@solarflare.com>; David Christensen <d...@linux.vnet.ibm.com>;
> bruce.richard...@intel.com; konstantin.anan...@intel.com;
> hemant.agra...@nxp.com; Shahaf Shuler <shah...@mellanox.com>;
> Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm
> Technology China) <gavin...@arm.com>; vikto...@rehivetech.com;
> anatoly.bura...@intel.com
> Subject: Mbuf memory alignment constraints for (micro)architectures
> CC:  Arch and platform maintainers
> While reviewing the mempool objection allocation requirements in the code,
> A) it's found that in the default case, mempool objects have padding
> in the object trailer to have start addresses of objects among the different
> channels,
> to enable equally load on the DRAM channel to have better performance
> # More documentation is here
> https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> in section 8.3. Memory Alignment Constraints
> B) The optimize_object_size() does the channel distribution requirement
> by the following formula
>         new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) /
>         while (get_gcd(new_obj_size, nrank * nchan) != 1)
>                new_obj_size++;
> C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2
> SoC
> The memory/DDR controller works in different way. Where by:
> # It does XOR operation of some  of physical address lines(not the user space
> VA address)
> to compute the hash and that the function defines the actual channel.
> The XOR(kind of CRC) scheme is useful because there is natural  channel
> distribution
> based on the address i.e No need to have padding to waste memory
> So, in short the padding scheme does not need for some SoC. I trying to send
> the patch
> to fix it. So the questions is,
> # Is PPC and other ARM SoC has formula (B)  to compute DRAM channel
> distribution ? or
> Is it specific to x86? That would define where the hooks needs to added to 
> have
> proper fix.
Reading through some documents, both x86 and arm, and having internal 
it looks like this is specific to x86, x86 spreads adjacent virtual addresses 
within a page across multiple memory devices, 
the interleaving was done per one or two cache lines. 

Arm leaves flexibility to implementations, no fixed pattern for interleaving 
and thus it can hardly be generalized. 

Reply via email to