On Thu, Jul 7, 2016 at 1:53 PM, Ronald S. Bultje <[email protected]> wrote: > Hi, > > On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková < > [email protected]> wrote: > >> else if (lc->cu.pred_mode == MODE_INTRA && c_idx == 0 && >> log2_trafo_size == 2) >> - s->hevcdsp.transform_4x4_luma_add(dst, coeffs, stride); >> + s->hevcdsp.idct_4x4_luma(coeffs); >> > > This is not an idct.
transform_4x4_luma would be better name then > > >> + s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs, stride); > > > Won't this be slower since there's a memory store intermediate? > > (I know it's faster now because you don't have inverse transform simd, but > you should fix that by writing inverse transform simd, not by splitting the > transform and the add.) > Separating adding residual from the transform seems to cause certain slow down but is needed to separate dc from idct which is faster overall, which I consider a good reason to do this. Sure, simd IDCT is needed and I'm working on it. _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
