Hi,

While working on PNG DSP, I realized the filter up was doing extra
overflow checks on the bytes non multiple of the long word size. This
logic is present in both C and x86 ASM. Here is the current C code for
reference:

-------------- 8< ----------------

// 0x7f7f7f7f or 0x7f7f7f7f7f7f7f7f or whatever, depending on the cpu's native 
arithmetic size
#define pb_7f (~0UL / 255 * 0x7f)
#define pb_80 (~0UL / 255 * 0x80)

static void add_bytes_l2_c(uint8_t *dst, uint8_t *src1, uint8_t *src2, int w)
{
    long i;
    for (i = 0; i <= w - (int) sizeof(long); i += sizeof(long)) {
        long a = *(long *)(src1 + i);
        long b = *(long *)(src2 + i);
        *(long *)(dst + i) = ((a & pb_7f) + (b & pb_7f)) ^ ((a ^ b) & pb_80);
    }
    for (; i < w; i++)
        dst[i] = src1[i] + src2[i];
}

-------------- 8< ----------------

The thing is, the buffers seem to be 0 padded to 16 (see
av_fast_padded_malloc() calls). I'm assuming there are cases where it's
not?

In any case, it looks like either the zero padding or the overflow checks
should go away. Removing the check will obviously make things much simpler but
I'm not sure that's possible.

-- 
Clément B.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to