Hi! > On 24 Jan 2021, at 17:35, Andriy Gelman <andriy.gel...@gmail.com> wrote: > > On Sat, 07. Nov 16:31, Andriy Gelman wrote: >> On Sat, 31. Oct 10:17, Andriy Gelman wrote: >>> On Fri, 16. Oct 00:02, Andriy Gelman wrote: >>>> On Fri, 09. Oct 20:17, Andriy Gelman wrote: >>>>> From: Chip Kerchner <chip.kerch...@ibm.com> >>>>> >>>>> --- >>>>> libswscale/ppc/yuv2rgb_altivec.c | 10 ++++++++++ >>>>> 1 file changed, 10 insertions(+) >>>>> >>>>> diff --git a/libswscale/ppc/yuv2rgb_altivec.c >>>>> b/libswscale/ppc/yuv2rgb_altivec.c >>>>> index 536545293d..930ef6b98f 100644 >>>>> --- a/libswscale/ppc/yuv2rgb_altivec.c >>>>> +++ b/libswscale/ppc/yuv2rgb_altivec.c >>>>> @@ -283,6 +283,16 @@ static inline void cvtyuvtoRGB(SwsContext *c, vector >>>>> signed short Y, >>>>> * >>>>> ------------------------------------------------------------------------------ >>>>> */ >>>>> >>>>> +#if !HAVE_VSX >>>>> +static inline vector unsigned char vec_xl(signed long long offset, const >>>>> ubyte *addr) >>>>> +{ >>>>> + const vector unsigned char *v_addr = (const vector unsigned char *) >>>>> (addr + offset); >>>>> + vector unsigned char align_perm = vec_lvsl(offset, addr); >>>>> + >>>>> + return (vector unsigned char) vec_perm(v_addr[0], v_addr[1], >>>>> align_perm); >>>>> +} >>>>> +#endif /* !HAVE_VSX */ >>>>> + >>>>> #define DEFCSP420_CVT(name, out_pixels) >>>>> \ >>>>> static int altivec_ ## name(SwsContext *c, const unsigned char **in, >>>>> \ >>>>> int *instrides, int srcSliceY, int srcSliceH, >>>>> \ >>>> >>>> Ping. >>>> This patch fixes PPC64 build on patchwork. >>>> >>> >>> ping >>> >> >> ping >> > > I asked Reimar to have a look at the patch. He didn't have a link to original > post, so I'm forwarding his feedback: > > - a few comments sure would be good > - the function likely should be always_inline > - the offset is always 0 in the code, so that could be optimized > - I am not sure if the addresses even can be unaligned (the whole magic code > is about fixing up unaligned loads since altivec simply ignores the low bits > of the address, but it looks like the x86 asm also assumes unaligned) > - the extra load needed to fix the alignment can read outside the bounds of > the buffer I think? Especially for chroma and if the load is aligned. Though > chroma seems to have an issue already today... >
I had a look at the code before the vec_xl was introduced, and besides the unnecessary extra offset it seems it was pretty much like this. So worst case this patch would restore the behaviour to before the vec_xl patch, which I guess is a reasonable argument to merge it whether it’s perfect or not. Personally I would have added a vecload (dropping the offset argument) or similar function that either does vec_xl or this, instead of introducing a “fake” vec_xl function, but that is just nitpicking I guess… Best regards, Reimar _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".