Hi folks of the Little CMS mailing list,

 

I'm just curious:  Given that PCs and Macs are based on Intel chipsets
nowadays...

 

Do we have a feel for how much Little CMS is being used on other
processor architectures?

 

I ask because I'm considering submitting some optimizations I've made to
the interpolation routines that speed things up for Intel-based systems.
They're not Intel-specific, but are just optimization of the source code
that pick up 5% to 20% in speed in the x64 testbed tests.

 

Git code as of yesterday, as measured on my dual-Xeon Westmere
workstation:

 

P E R F O R M A N C E   T E S T S

=================================

 

16 bits on CLUT profiles                     : 34.4828 MPixel/sec.

8 bits on CLUT profiles                      : 32.3232 MPixel/sec.

8 bits on Matrix-Shaper profiles             : 66.6667 MPixel/sec.

8 bits on SAME Matrix-Shaper profiles        : 120.301 MPixel/sec.

8 bits on Matrix-Shaper profiles (AbsCol)    : 66.6667 MPixel/sec.

16 bits on Matrix-Shaper profiles            : 34.4828 MPixel/sec.

16 bits on SAME Matrix-Shaper profiles       : 137.931 MPixel/sec.

16 bits on Matrix-Shaper profiles (AbsCol)   : 34.4828 MPixel/sec.

8 bits on curves                             : 88.8889 MPixel/sec.

16 bits on curves                            : 91.4286 MPixel/sec.

8 bits on CMYK profiles                      : 11.9314 MPixel/sec.

16 bits on CMYK profiles                     : 11.976 MPixel/sec.

8 bits on gray-to gray                       : 104.575 MPixel/sec.

8 bits on gray-to-lab gray                   : 105.263 MPixel/sec.

8 bits on SAME gray-to-gray                  : 105.263 MPixel/sec.

 

 

My current code:

 

P E R F O R M A N C E   T E S T S

=================================

 

16 bits on CLUT profiles                     : 38.5542 MPixel/sec.

8 bits on CLUT profiles                      : 33.0579 MPixel/sec.

8 bits on Matrix-Shaper profiles             : 66.1157 MPixel/sec.

8 bits on SAME Matrix-Shaper profiles        : 121.212 MPixel/sec.

8 bits on Matrix-Shaper profiles (AbsCol)    : 66.9456 MPixel/sec.

16 bits on Matrix-Shaper profiles            : 38.5542 MPixel/sec.

16 bits on SAME Matrix-Shaper profiles       : 142.857 MPixel/sec.

16 bits on Matrix-Shaper profiles (AbsCol)   : 38.5542 MPixel/sec.

8 bits on curves                             : 89.3855 MPixel/sec.

16 bits on curves                            : 94.1176 MPixel/sec.

8 bits on CMYK profiles                      : 14.4796 MPixel/sec.

16 bits on CMYK profiles                     : 14.5587 MPixel/sec.

8 bits on gray-to gray                       : 125 MPixel/sec.

8 bits on gray-to-lab gray                   : 124.031 MPixel/sec.

8 bits on SAME gray-to-gray                  : 124.031 MPixel/sec.

 

These translate to real product gains...  For example, with a 100
megapixel 32 bit grayscale image our heavily multi-threaded transform
time dropped from 1485 milliseconds to 968 milliseconds.

 

Source rearrangement notwithstanding, if one were to create routines
that would make use of the vector instructions virtually every Intel
system already has (e.g., SSE2) the results could be markedly better
still.  I've been through converting all my own software to use vectors
and the results were well worth the effort.  We now run faster with 32
bit floating point than we used to with integer formats.

 

There is also the further possibility of extending the Little CMS
algorithms into the GPU for huge gains.  I suppose the trouble with that
would be figuring out what subsystem to use (OpenCL programs...  OpenGL
shaders...  Vulkan?  Others?)

 

-Noel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Lcms-user mailing list
Lcms-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lcms-user

Reply via email to