Hi, While working on PNG DSP, I realized the filter up was doing extra overflow checks on the bytes non multiple of the long word size. This logic is present in both C and x86 ASM. Here is the current C code for reference:
-------------- 8< ---------------- // 0x7f7f7f7f or 0x7f7f7f7f7f7f7f7f or whatever, depending on the cpu's native arithmetic size #define pb_7f (~0UL / 255 * 0x7f) #define pb_80 (~0UL / 255 * 0x80) static void add_bytes_l2_c(uint8_t *dst, uint8_t *src1, uint8_t *src2, int w) { long i; for (i = 0; i <= w - (int) sizeof(long); i += sizeof(long)) { long a = *(long *)(src1 + i); long b = *(long *)(src2 + i); *(long *)(dst + i) = ((a & pb_7f) + (b & pb_7f)) ^ ((a ^ b) & pb_80); } for (; i < w; i++) dst[i] = src1[i] + src2[i]; } -------------- 8< ---------------- The thing is, the buffers seem to be 0 padded to 16 (see av_fast_padded_malloc() calls). I'm assuming there are cases where it's not? In any case, it looks like either the zero padding or the overflow checks should go away. Removing the check will obviously make things much simpler but I'm not sure that's possible. -- Clément B.
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel