On Tue, Sep 24, 2024 at 5:44 AM Zhao Zhili <quinkbl...@foxmail.com> wrote: > > On Sep 18, 2024, at 21:11, Zhao Zhili <quinkbl...@foxmail.com> wrote: > > From: Zhao Zhili <zhiliz...@tencent.com> > > > > Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16. > > Add a simple wrapper to handle the non-aligned part. > > > > Signed-off-by: Zhao Zhili <zhiliz...@tencent.com> > > Co-authored-by: johzzy <hellojinqi...@gmail.com> > > --- > > v2: test width 2 and 540 > > > > libswscale/aarch64/rgb2rgb.c | 23 ++++++++++++++++++++++- > > tests/checkasm/sw_rgb.c | 2 +- > > 2 files changed, 23 insertions(+), 2 deletions(-) > > > > diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c > > index d978a6f173..20a25033cb 100644 > > --- a/libswscale/aarch64/rgb2rgb.c > > +++ b/libswscale/aarch64/rgb2rgb.c > > @@ -27,9 +27,30 @@ > > #include "libswscale/swscale.h" > > #include "libswscale/swscale_internal.h" > > > > +// Only handle width aligned to 16 > > void ff_rgb24toyv12_neon(const uint8_t *src, uint8_t *ydst, uint8_t *udst, > > uint8_t *vdst, int width, int height, int > > lumStride, > > int chromStride, int srcStride, int32_t *rgb2yuv); > > + > > +static void rgb24toyv12(const uint8_t *src, uint8_t *ydst, uint8_t *udst, > > + uint8_t *vdst, int width, int height, int > > lumStride, > > + int chromStride, int srcStride, int32_t *rgb2yuv) > > +{ > > + int width_align = width & (~15); > > + > > + if (width_align > 0) > > + ff_rgb24toyv12_neon(src, ydst, udst, vdst, width_align, height, > > + lumStride, chromStride, srcStride, rgb2yuv); > > + if (width_align < width) { > > + src += width_align * 3; > > + ydst += width_align; > > + udst += width_align / 2; > > + vdst += width_align / 2; > > + ff_rgb24toyv12_c(src, ydst, udst, vdst, width - width_align, > > height, > > + lumStride, chromStride, srcStride, rgb2yuv); > > + } > > +} > > + > > void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2, > > uint8_t *dest, int width, int height, > > int src1Stride, int src2Stride, int > > dstStride); > > @@ -42,7 +63,7 @@ av_cold void rgb2rgb_init_aarch64(void) > > int cpu_flags = av_get_cpu_flags(); > > > > if (have_neon(cpu_flags)) { > > - ff_rgb24toyv12 = ff_rgb24toyv12_neon; > > + ff_rgb24toyv12 = rgb24toyv12; > > interleaveBytes = ff_interleave_bytes_neon; > > deinterleaveBytes = ff_deinterleave_bytes_neon; > > } > > diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c > > index af9434073a..7a6d621375 100644 > > --- a/tests/checkasm/sw_rgb.c > > +++ b/tests/checkasm/sw_rgb.c > > @@ -129,7 +129,7 @@ static int cmp_off_by_n(const uint8_t *ref, const > > uint8_t *test, size_t n, int a > > > > static void check_rgb24toyv12(struct SwsContext *ctx) > > { > > - static const int input_sizes[] = {16, 128, 512, MAX_LINE_SIZE, > > -MAX_LINE_SIZE}; > > + static const int input_sizes[] = {2, 16, 128, 540, MAX_LINE_SIZE, > > -MAX_LINE_SIZE}; > > > > LOCAL_ALIGNED_32(uint8_t, src, [BUFSIZE * 3]); > > LOCAL_ALIGNED_32(uint8_t, buf_y_0, [BUFSIZE]); > > -- > > 2.42.0 > > > > Applied.
Thanks. One of these days I'll go over these functions and check their required alignment and width/height, and hopefully document it better. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".