Hi, I thought I see if I can speed up PNG loading by vectorizing alpha premultiplication, and it actually does give a nice speedup:
commit d7d592b0acb25ad8084b1d60459dd40bfd9c3356 (HEAD -> png-faster, github/png-faster) Author: Behdad Esfahbod <beh...@behdad.org> Date: Tue Aug 8 21:29:25 2017 -0700 Process four pixels at a time in premultiply_data() PNG function Load/store using memcpy(). Now this is finally faster than the non-vectorized code. The premultiply_data() overhead is reduced by 60%. Numbers for: $ ftbench -b a ~/.fonts/NotoColorEmoji.ttf Without premultiply_data: 155 us/op With 4-pixel vectorization: 167 us/op <--------- Without vectorization: 182 us/op Code here: https://github.com/behdad/freetype/commits/png-faster The code is rather terse but readable. I can add comments. Needs some GCC/clang checks, as well as implementing the big-endian case (or disable it for big-endian). I couldn't find any endianness macros in FreeType. Cheers, -- behdad http://behdad.org/
_______________________________________________ Freetype-devel mailing list Freetype-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/freetype-devel