Hello all,

I've come across the "memcpy_chunk_size" MCA parameter in smsc/xpmem, which 
effectively causes
memory copies to take place in chunks (used in mca_smsc_xpmem_memmove()). The 
comment reads:

"Maximum size to copy with a single call to memcpy. On some systems a smaller 
or larger number may
provide better performance (default: 256k)"

And I have indeed observed performance difference by adjusting it! E.g. in a 
simple point-to-point
test, 2 MB messages do significantly better with the parameter set to 1 MB vs 2 
MB. But... why? I
suppose I could imagine a memcpy of larger size being more efficient, but what 
would cause many
small ones to end up being quicker than a single large one? Might it have 
something to do with
memcpy intrinsics and different implementation for different sizes?

If someone knows what's going on under the hood and/or could direct me to any 
relevant resources, I
would greatly appreciate it!

George

Reply via email to