On Thu, Nov 27, 2014 at 02:35:24PM +0800, rongyan wrote: > Hi, > We present 5 patches to fix h264 bugs for POWER8 little endian, which are > sent in 5 seperate emails. > This is the second, to fix the functions > h264_idct8_add_altivec(); > > h264_idct_dc_add_internal(); > > h264_loop_filter_luma_altivec(); > > write16x4() VEC_1D_DCT(); > > weight_h264_W_altivec(); > > biweight_h264_W_altivec(); > > VEC_LOAD_U8_ADD_S16_STORE_U8(); > > ALTIVEC_STORE_SUM_CLIP(); > > And add marcos GET_2PERM(), dstv_load(),vdst_load(), dest_unligned_store(). > > The fate test result after merge these 5 patches can be found on website by > searching "ibmcrl", also attached in the below to facilitate the review. The > passed test cases change from 2017/2243 to 2209/2245. > > Thanks. > Rong Yan > > ------------------ > The world has enough for everyone's need, but not enough for everyone's > greed.
> h264dsp.c | 374 > +++++++++++++++++++++++++++++++++----------------------------- > 1 file changed, 205 insertions(+), 169 deletions(-) > dcaccec4338f960704148c933e1ec454dd4dc6a2 > 0002-libavcodec-ppc-h264dsp.c-fix-h264_idct8_add_altivec.patch > From 130b20e650a2d83a4c66cd23c10fe943742339f8 Mon Sep 17 00:00:00 2001 > From: Rong Yan <rongyan...@gmail.com> > Date: Thu, 27 Nov 2014 05:49:53 +0000 > Subject: [PATCH 2/5] libavcodec/ppc/h264dsp.c : fix h264_idct8_add_altivec() > h264_idct_dc_add_internal() h264_loop_filter_luma_altivec() write16x4() > VEC_1D_DCT() weight_h264_W_altivec() biweight_h264_W_altivec() > VEC_LOAD_U8_ADD_S16_STORE_U8() ALTIVEC_STORE_SUM_CLIP() add marcos > GET_2PERM() dstv_load() vdst_load() dest_unligned_store() for POWER LE > > --- > libavcodec/ppc/h264dsp.c | 374 > ++++++++++++++++++++++++++--------------------- > 1 file changed, 205 insertions(+), 169 deletions(-) > > diff --git a/libavcodec/ppc/h264dsp.c b/libavcodec/ppc/h264dsp.c > index 7fc7e0b..cfce32d 100644 > --- a/libavcodec/ppc/h264dsp.c > +++ b/libavcodec/ppc/h264dsp.c > @@ -34,7 +34,7 @@ > * IDCT transform: > > ****************************************************************************/ > > -#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3) \ > +#define VEC_1D_DCT(vb0,vb1,vb2,vb3,va0,va1,va2,va3) {\ > /* 1st stage */ \ > vz0 = vec_add(vb0,vb2); /* temp[0] = Y[0] + Y[2] */ \ > vz1 = vec_sub(vb0,vb2); /* temp[1] = Y[0] - Y[2] */ \ > @@ -46,7 +46,8 @@ > va0 = vec_add(vz0,vz3); /* x[0] = temp[0] + temp[3] */ \ > va1 = vec_add(vz1,vz2); /* x[1] = temp[1] + temp[2] */ \ > va2 = vec_sub(vz1,vz2); /* x[2] = temp[1] - temp[2] */ \ > - va3 = vec_sub(vz0,vz3) /* x[3] = temp[0] - temp[3] */ > + va3 = vec_sub(vz0,vz3); /* x[3] = temp[0] - temp[3] */\ > +} > > #define VEC_TRANSPOSE_4(a0,a1,a2,a3,b0,b1,b2,b3) \ > b0 = vec_mergeh( a0, a0 ); \ > @@ -62,14 +63,23 @@ > b2 = vec_mergeh( a1, a3 ); \ > b3 = vec_mergel( a1, a3 ) > > -#define VEC_LOAD_U8_ADD_S16_STORE_U8(va) \ > - vdst_orig = vec_ld(0, dst); \ > - vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask); \ > - vdst_ss = (vec_s16) vec_mergeh(zero_u8v, vdst); \ > - va = vec_add(va, vdst_ss); \ > - va_u8 = vec_packsu(va, zero_s16v); \ > - va_u32 = vec_splat((vec_u32)va_u8, 0); \ > - vec_ste(va_u32, element, (uint32_t*)dst); > +#if HAVE_BIGENDIAN > +#define vdst_load(d)\ > + vdst_orig = vec_ld(0, dst); \ > + vdst = vec_perm(vdst_orig, zero_u8v, vdst_mask) > +#else > +#define vdst_load(d)\ > + vdst = vec_vsx_ld(0, dst) > +#endif > + > +#define VEC_LOAD_U8_ADD_S16_STORE_U8(va) {\ > + vdst_load();\ > + vdst_ss = (vec_s16) VEC_MERGEH(zero_u8v, vdst);\ > + va = vec_add(va, vdst_ss);\ > + va_u8 = vec_packsu(va, zero_s16v);\ > + va_u32 = vec_splat((vec_u32)va_u8, 0);\ > + vec_ste(va_u32, element, (uint32_t*)dst);\ > +} please dont mix whitespace changes with functional changes this makes the patch and commit unreadable it also can cause problems for other developers as rebasing their work becomes harder if the code changed alot please leave the whitespaces in place git show HEAD^^^ -w --stat libavcodec/ppc/h264dsp.c | 106 +++++++++++++++++++++++++++++++--------------- 1 file changed, 71 insertions(+), 35 deletions(-) git show HEAD^^^ --stat libavcodec/ppc/h264dsp.c | 374 +++++++++++++++++++++++++--------------------- 1 file changed, 205 insertions(+), 169 deletions(-) [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Dictatorship naturally arises out of democracy, and the most aggravated form of tyranny and slavery out of the most extreme liberty. -- Plato
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel