Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jer...@marvell.com>
> Sent: Thursday, October 31, 2019 2:02 AM
> To: dev@dpdk.org
> Cc: Olivier Matz <olivier.m...@6wind.com>; Andrew Rybchenko
> <arybche...@solarflare.com>; David Christensen <d...@linux.vnet.ibm.com>;
> bruce.richard...@intel.com; konstantin.anan...@intel.com;
> hemant.agra...@nxp.com; Shahaf Shuler <shah...@mellanox.com>;
> Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm
> Technology China) <gavin...@arm.com>; vikto...@rehivetech.com;
> anatoly.bura...@intel.com
> Subject: Mbuf memory alignment constraints for (micro)architectures
> 
> CC:  Arch and platform maintainers
> 
> While reviewing the mempool objection allocation requirements in the code,
> 
> A) it's found that in the default case, mempool objects have padding
> in the object trailer to have start addresses of objects among the different
> channels,
> to enable equally load on the DRAM channel to have better performance
> 
> # More documentation is here
> https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> in section 8.3. Memory Alignment Constraints
> 
> B) The optimize_object_size() does the channel distribution requirement
> by the following formula
> 
>         new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) /
> RTE_MEMPOOL_ALIGN;
>         while (get_gcd(new_obj_size, nrank * nchan) != 1)
>                new_obj_size++;
> 
> 
> C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2
> SoC
> The memory/DDR controller works in different way. Where by:
> # It does XOR operation of some  of physical address lines(not the user space
> VA address)
> to compute the hash and that the function defines the actual channel.
> 
> The XOR(kind of CRC) scheme is useful because there is natural  channel
> distribution
> based on the address i.e No need to have padding to waste memory
> 
> So, in short the padding scheme does not need for some SoC. I trying to send
> the patch
> to fix it. So the questions is,
> 
> # Is PPC and other ARM SoC has formula (B)  to compute DRAM channel
> distribution ? or
> Is it specific to x86? That would define where the hooks needs to added to 
> have
> proper fix.
Reading through some documents, both x86 and arm, and having internal 
discussion,
it looks like this is specific to x86, x86 spreads adjacent virtual addresses 
within a page across multiple memory devices, 
the interleaving was done per one or two cache lines. 
https://software.intel.com/en-us/articles/how-memory-is-accessed    

Arm leaves flexibility to implementations, no fixed pattern for interleaving 
and thus it can hardly be generalized. 
/Gavin
> 
> 
> 
> 
> 
> 
> 
> 

Reply via email to