On Mon, 13 May 2013, Richard Hughes wrote:
>
> I figured 4 threads should be ~4x faster than using 1 thread (in the
> second case we should only have 4 threads, so not much overhead), but
> no matter the value of max_threads or 'n' I can only achieve a ~1.9x
> speed-up. I've tried with and without cmsFLAGS_NOCACHE. Any pointers
> very welcome.

What specific CPU are you using?

It would be good to share the ICC profile you are using for testing 
since it can make a difference.  If lcms is only doing indexed lookups 
for the profile, then memory accesses may be the bottleneck rather 
than CPU.

Are you sharing the same transform (created by one thread), or are you 
creating an independent transform for each thread (ideally created by 
the thread which uses it)?  Creating the transform can consume 
considerable time so it can be useful to parallelize (even though it 
"wastes" CPU) and it help work better given whatever NUMA 
characteristics pertain to your hardware.

Cache-line effects can be significant if there is accidental 
cache-line sharing (two cores sharing data in the same cache line). 
Padding structures to prevent false-sharing or using an aligned memory 
allocator can help surmount such problems.  Cache line issues can be 
very hardware/OS specific and mysterious.

Bob
-- 
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Lcms-user mailing list
Lcms-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lcms-user

Reply via email to