Hi, Back again :-) This WE, I implemented some basic premultiplied alpha support.
I agree with Bob and John that Premultiplied alpha doesn't "make a lot of sense with a color management system since color perception varies with intensity and the colorspace may not be linear across the color channels". But such color management makes sense for images with binary transparencies (full opacity & full transparency) where only few pixels could be of intermediary transparency (usually the anti-aliased edges). LittleCMS 2.4 was not able to manage it correctly and the only way to manage such images was to: - turn them into unpremultiplied alpha - run cmsDoTransform (unoptimized way because even fully transparent pixels are processed) - turn back to premultiplied alpha It was way too slow, so I coded some limited support for premultiplied alpha. Works if : - In must be TYPE_BGRA_8 - Out must be TYPE_BGRA_8 too. - (there must be some internal matrix optimization, because my code (hack) is using MatShaper8Data) A new flag has been added : PREMUL_SH and it works only with TYPE_BGRA_8. But the code, right now, looks like some quick and dirty hack :-( First, because I'm not used to the internals of this library... And secondly, because, it doesn't respect the current pipeline. Because premultiplied alpha means extra computation and I need vey fast process, I had to merge the InputFormatters, xform and OutputFormatters into one routine. In littleCMS 2.4, ~100 CPU cycle was necessary to processed one pixel BGRA. The optimized code needs ~28 CPU cycles (unlinked alpha). And premultiplied alpha optimized code requires ~65 CPU cycles for the worst case (no alpha of 0 or 1.0). I have done some benchmarking too, using an ECI tagged image + 2 alpha masks + a random screen profile. The goal was to process a ECI image and turn it into the screen color space. Right now, the code is fast enough for my needs (for now). Of course, if someone can help me to integrate such code (the clean way) I would be happy to contribute seriously. In the futur, some real optimization could be done (no SSE or OpenCL vectorisation code) Modified code (derivated from LittleCMS 2.4) + test program can be found here : http://sebastienleon.com/info/littleCMS/littleCMS_PreMulAlphaHack.zip (I give all copyrights to Marti) Qt 4.x is required to build the test program. (Works on Mac/Linux/Windows, do : "qmake && make" and "./test") Best regards Sebastien Léon ----------------------------------------- LittleCMS Test/Hacks & simple benchmarking... Init OK... ******* Start TEST : LittleCMS 2.4 Legacy ******* (test 0 lasts 683725 KCycles). (test 1 lasts 686102 KCycles). (test 2 lasts 686626 KCycles). (test 3 lasts 683895 KCycles). Average Test lasts 685087 KCycles. Average CPU Cycle per pixel = 99.85. ------------------------------------------- ******* Start TEST : LittleCMS 2.4 + Unroll3BytesSkip1SwapExtFirst ******* (test 0 lasts 406031 KCycles). (test 1 lasts 405165 KCycles). (test 2 lasts 406182 KCycles). (test 3 lasts 404195 KCycles). Average Test lasts 405393 KCycles. Average CPU Cycle per pixel = 59.09. ------------------------------------------- ******* Start TEST : RGBAEngineWithAlphaIgnored ******* (test 0 lasts 189697 KCycles). (test 1 lasts 188906 KCycles). (test 2 lasts 191982 KCycles). (test 3 lasts 190995 KCycles). Average Test lasts 190395 KCycles. Average CPU Cycle per pixel = 27.75. ------------------------------------------- ******* Start TEST : PreMulEngineWithNoAlpha ******* (test 0 lasts 209824 KCycles). (test 1 lasts 208821 KCycles). (test 2 lasts 208711 KCycles). (test 3 lasts 207210 KCycles). Average Test lasts 208641 KCycles. Average CPU Cycle per pixel = 30.41. ------------------------------------------- ******* Start TEST : PreMulEngineWithPreMulAlpha_WorstCase ******* (test 0 lasts 444388 KCycles). (test 1 lasts 447862 KCycles). (test 2 lasts 443876 KCycles). (test 3 lasts 439628 KCycles). Average Test lasts 443939 KCycles. Average CPU Cycle per pixel = 64.70. ------------------------------------------- ******* Start TEST : PreMulEngineWithPreMulAlpha_SpriteCase ******* (test 0 lasts 132157 KCycles). (test 1 lasts 130319 KCycles). (test 2 lasts 131186 KCycles). (test 3 lasts 150089 KCycles). Average Test lasts 135938 KCycles. Average CPU Cycle per pixel = 19.81. ------------------------------------------- Work's done... ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Lcms-user mailing list Lcms-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lcms-user