Hi!

> On 24 Jan 2021, at 17:35, Andriy Gelman <andriy.gel...@gmail.com> wrote:
> 
> On Sat, 07. Nov 16:31, Andriy Gelman wrote:
>> On Sat, 31. Oct 10:17, Andriy Gelman wrote:
>>> On Fri, 16. Oct 00:02, Andriy Gelman wrote:
>>>> On Fri, 09. Oct 20:17, Andriy Gelman wrote:
>>>>> From: Chip Kerchner <chip.kerch...@ibm.com>
>>>>> 
>>>>> ---
>>>>> libswscale/ppc/yuv2rgb_altivec.c | 10 ++++++++++
>>>>> 1 file changed, 10 insertions(+)
>>>>> 
>>>>> diff --git a/libswscale/ppc/yuv2rgb_altivec.c 
>>>>> b/libswscale/ppc/yuv2rgb_altivec.c
>>>>> index 536545293d..930ef6b98f 100644
>>>>> --- a/libswscale/ppc/yuv2rgb_altivec.c
>>>>> +++ b/libswscale/ppc/yuv2rgb_altivec.c
>>>>> @@ -283,6 +283,16 @@ static inline void cvtyuvtoRGB(SwsContext *c, vector 
>>>>> signed short Y,
>>>>>  * 
>>>>> ------------------------------------------------------------------------------
>>>>>  */
>>>>> 
>>>>> +#if !HAVE_VSX
>>>>> +static inline vector unsigned char vec_xl(signed long long offset, const 
>>>>> ubyte *addr)
>>>>> +{
>>>>> +    const vector unsigned char *v_addr = (const vector unsigned char *) 
>>>>> (addr + offset);
>>>>> +    vector unsigned char align_perm = vec_lvsl(offset, addr);
>>>>> +
>>>>> +    return (vector unsigned char) vec_perm(v_addr[0], v_addr[1], 
>>>>> align_perm);
>>>>> +}
>>>>> +#endif /* !HAVE_VSX */
>>>>> +
>>>>> #define DEFCSP420_CVT(name, out_pixels)                                   
>>>>>     \
>>>>> static int altivec_ ## name(SwsContext *c, const unsigned char **in,      
>>>>>     \
>>>>>                             int *instrides, int srcSliceY, int srcSliceH, 
>>>>>     \
>>>> 
>>>> Ping.
>>>> This patch fixes PPC64 build on patchwork.
>>>> 
>>> 
>>> ping
>>> 
>> 
>> ping 
>> 
> 
> I asked Reimar to have a look at the patch. He didn't have a link to original
> post, so I'm forwarding his feedback: 
> 
> - a few comments sure would be good
> - the function likely should be always_inline
> - the offset is always 0 in the code, so that could be optimized
> - I am not sure if the addresses even can be unaligned (the whole magic code 
> is about fixing up unaligned loads since altivec simply ignores the low bits 
> of the address, but it looks like the x86 asm also assumes unaligned)
> - the extra load needed to fix the alignment can read outside the bounds of 
> the buffer I think? Especially for chroma and if the load is aligned. Though 
> chroma seems to have an issue already today...
> 

I had a look at the code before the vec_xl was introduced, and besides the 
unnecessary extra offset it seems it was pretty much like this.
So worst case this patch would restore the behaviour to before the vec_xl 
patch, which I
guess is a reasonable argument to merge it whether it’s perfect or not.
Personally I would have added a vecload (dropping the offset argument) or 
similar function
that either does vec_xl or this, instead of introducing a “fake” vec_xl 
function,
but that is just nitpicking I guess…

Best regards,
Reimar

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to