On 19 July 2018 at 16:52, James Darnley <jdarn...@obe.tv> wrote: > On 2018-07-19 17:26, Rostislav Pehlivanov wrote: > > On 19 July 2018 at 15:52, James Darnley <jdarn...@obe.tv> wrote: > > > >> int32_t *b1, int32_t *b2, int > >> b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]); > >> } > >> > >> +static void dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t > *b2, > >> + int32_t *b3, int32_t *b4, int width) > >> +{ > >> + int i = width & ~3; > >> + ff_dd97_vertical_hi_sse2(b0, b1, b2, b3, b4, i); > >> + for(; i<width; i++) > >> + b2[i] = COMPOSE_DD97iH0(b0[i], b1[i], b2[i], b3[i], b4[i]); > >> + > >> +} > >> > > > > > > This, along with the rest of the patchset: what's up with the hybrid > > implementations? Couldn't you put the second part in the asm code as > well? > > Now there are 2 function calls instead of 1. > > The 8-bit code does this and I just followed it lead. I believe this is > done because we cannot write junk data beyond what we think is the end > of the line because this might be one of the higher depths and the > coeffs for the next level sit beyond the end of the line. > > But now it has just occurred to me that maybe you meant "why didn't you > do the scalar operations in SIMD?", is that what you meant? Answer is > because it didn't occur to me at the time. Aside from that I always > write do-while loops in assembly because I can usually guarantee 1 run > of the block. > > I can certainly look at making that change. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
Yep, I think you ought to put the scalar code in the asm. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel