Noel Carboni wrote: > By the way, as an exercise to reinforce the above, I re-coded the > LittleCMS floating point trilinear interpolation algorithm using SSE2 > intrinsics. It ended up delivering the same performance as the C-coded > version. Why not better? Because the table-based design of the Little > CMS library doesn't suit parallel calculations so there were only limited > things I could do.
Simplex interpolation is generally faster since it touches fewer node points - something that increases in importance with higher input dimensions - but simplex isn't terribly parallelizable, since it involves a sort. Once the weighting of each nodes is known using simplex or multi-linear, paralleling the output dimensions calculations is a good speedup though. [ How much of a win vector CPU instructions would be is not something I've ever had time to explore in my color engine, and I've been content to stick to portable C code, while wringing what I can out of it. Exploiting GPU texture lookup hardware seems far simpler to code for, for maximum overall speed. ] Cheers, Graeme Gill. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Lcms-user mailing list Lcms-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lcms-user