Hi, On Thu, Jul 7, 2016 at 9:52 AM, Alexandra Hájková < alexandra.khirn...@gmail.com> wrote:
> On Thu, Jul 7, 2016 at 1:53 PM, Ronald S. Bultje <rsbul...@gmail.com> > wrote: > > On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková < > > alexandra.khirn...@gmail.com> wrote: > > > + s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs, stride); > > > > Won't this be slower since there's a memory store intermediate? > > > > (I know it's faster now because you don't have inverse transform simd, > but > > you should fix that by writing inverse transform simd, not by splitting > the > > transform and the add.) > > Separating adding residual from the transform seems to cause certain > slow down but is needed to separate dc from idct which is faster overall, > which I consider a good reason to do this. I'm not sure I understand why, could you elaborate on this? Sure, simd IDCT is needed and I'm working on it. Great! Ronald _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel