Hi,

I thought I see if I can speed up PNG loading by vectorizing alpha
premultiplication, and it actually does give a nice speedup:

commit d7d592b0acb25ad8084b1d60459dd40bfd9c3356 (HEAD -> png-faster,
github/png-faster)
Author: Behdad Esfahbod <beh...@behdad.org>
Date:   Tue Aug 8 21:29:25 2017 -0700

    Process four pixels at a time in premultiply_data() PNG function

    Load/store using memcpy().  Now this is finally faster than the
non-vectorized
    code.  The premultiply_data() overhead is reduced by 60%.

    Numbers for:

    $ ftbench -b a ~/.fonts/NotoColorEmoji.ttf

    Without premultiply_data:       155 us/op
    With 4-pixel vectorization:     167 us/op <---------
    Without vectorization:          182 us/op

Code here:

  https://github.com/behdad/freetype/commits/png-faster

The code is rather terse but readable.  I can add comments.  Needs some
GCC/clang checks, as well as implementing the big-endian case (or disable
it for big-endian).  I couldn't find any endianness macros in FreeType.

Cheers,
-- 
behdad
http://behdad.org/
_______________________________________________
Freetype-devel mailing list
Freetype-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/freetype-devel

Reply via email to