Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <[email protected]>
> Sent: Thursday, October 31, 2019 2:02 AM
> To: [email protected]
> Cc: Olivier Matz <[email protected]>; Andrew Rybchenko
> <[email protected]>; David Christensen <[email protected]>;
> [email protected]; [email protected];
> [email protected]; Shahaf Shuler <[email protected]>;
> Honnappa Nagarahalli <[email protected]>; Gavin Hu (Arm
> Technology China) <[email protected]>; [email protected];
> [email protected]
> Subject: Mbuf memory alignment constraints for (micro)architectures
> 
> CC:  Arch and platform maintainers
> 
> While reviewing the mempool objection allocation requirements in the code,
> 
> A) it's found that in the default case, mempool objects have padding
> in the object trailer to have start addresses of objects among the different
> channels,
> to enable equally load on the DRAM channel to have better performance
> 
> # More documentation is here
> https://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> in section 8.3. Memory Alignment Constraints
> 
> B) The optimize_object_size() does the channel distribution requirement
> by the following formula
> 
>         new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) /
> RTE_MEMPOOL_ALIGN;
>         while (get_gcd(new_obj_size, nrank * nchan) != 1)
>                new_obj_size++;
> 
> 
> C) The formula mentioned in the (B) is NOT generic. At least of the octeontx2
> SoC
> The memory/DDR controller works in different way. Where by:
> # It does XOR operation of some  of physical address lines(not the user space
> VA address)
> to compute the hash and that the function defines the actual channel.
> 
> The XOR(kind of CRC) scheme is useful because there is natural  channel
> distribution
> based on the address i.e No need to have padding to waste memory
> 
> So, in short the padding scheme does not need for some SoC. I trying to send
> the patch
> to fix it. So the questions is,
> 
> # Is PPC and other ARM SoC has formula (B)  to compute DRAM channel
> distribution ? or
> Is it specific to x86? That would define where the hooks needs to added to 
> have
> proper fix.
Reading through some documents, both x86 and arm, and having internal 
discussion,
it looks like this is specific to x86, x86 spreads adjacent virtual addresses 
within a page across multiple memory devices, 
the interleaving was done per one or two cache lines. 
https://software.intel.com/en-us/articles/how-memory-is-accessed    

Arm leaves flexibility to implementations, no fixed pattern for interleaving 
and thus it can hardly be generalized. 
/Gavin
> 
> 
> 
> 
> 
> 
> 
> 

Reply via email to