Noel Carboni wrote:

> By the way, as an exercise to reinforce the above, I re-coded the 
> LittleCMS floating point trilinear interpolation algorithm using SSE2 
> intrinsics.  It ended up delivering the same performance as the C-coded 
> version.  Why not better?  Because the table-based design of the Little 
> CMS library doesn't suit parallel calculations so there were only limited 
> things I could do.

Simplex interpolation is generally faster since it touches
fewer node points - something that increases in importance
with higher input dimensions - but simplex isn't terribly parallelizable,
since it involves a sort. Once the weighting of each nodes
is known using simplex or multi-linear, paralleling the output
dimensions calculations is a good speedup though.

[ How much of a win vector CPU instructions would be is
  not something I've ever had time to explore in my color
  engine, and I've been content to stick to portable C code,
  while wringing what I can out of it.
  Exploiting GPU texture lookup hardware seems far simpler to
  code for, for maximum overall speed. ]

Cheers,

Graeme Gill.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Lcms-user mailing list
Lcms-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lcms-user

Reply via email to