Hi, On Thu, Jul 7, 2016 at 10:53 AM, Ronald S. Bultje <[email protected]> wrote:
> Hi, > > On Thu, Jul 7, 2016 at 9:52 AM, Alexandra Hájková < > [email protected]> wrote: > >> On Thu, Jul 7, 2016 at 1:53 PM, Ronald S. Bultje <[email protected]> >> wrote: >> > On Thu, Jul 7, 2016 at 5:25 AM, Alexandra Hájková < >> > [email protected]> wrote: >> > > + s->hevcdsp.add_residual[log2_trafo_size - 2](dst, coeffs, >> stride); >> > >> > Won't this be slower since there's a memory store intermediate? >> > >> > (I know it's faster now because you don't have inverse transform simd, >> but >> > you should fix that by writing inverse transform simd, not by splitting >> the >> > transform and the add.) >> >> Separating adding residual from the transform seems to cause certain >> slow down but is needed to separate dc from idct which is faster overall, >> which I consider a good reason to do this. > > > I'm not sure I understand why, could you elaborate on this? > > Sure, simd IDCT is needed and I'm working on it. > > > Great! > Btw I'm just noticing that all my comments apply to the ffmpeg codebase also, so perhaps you should just ignore my comments and we can fix that later on... Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
