On Thu, Jul 7, 2016 at 1:53 PM, Ronald S. Bultje <[email protected]> wrote:
> Hi,
>
> On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková <
> [email protected]> wrote:
>
>>          else if (lc->cu.pred_mode == MODE_INTRA && c_idx == 0 &&
>>                   log2_trafo_size == 2)
>> -            s->hevcdsp.transform_4x4_luma_add(dst, coeffs, stride);
>> +            s->hevcdsp.idct_4x4_luma(coeffs);
>>
>
> This is not an idct.

transform_4x4_luma would be better name then
>
>
>> +    s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs, stride);
>
>
> Won't this be slower since there's a memory store intermediate?
>
> (I know it's faster now because you don't have inverse transform simd, but
> you should fix that by writing inverse transform simd, not by splitting the
> transform and the add.)
>

Separating adding residual from the transform seems to cause certain
slow down but  is needed to separate dc from idct which is faster overall,
which I consider a good reason to do this. Sure, simd IDCT is needed
and I'm working on it.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to