I found that will break this optimization patchset, so this patchset LGTM, will push latter. Thanks.
On Mon, Mar 10, 2014 at 10:23:42AM +0800, Zhigang Gong wrote: > What's the side effect for the original implementation? > IMO, by default, a load on an int4 vector is better than > 3 loads on int scalar. Right? > > On Fri, Mar 07, 2014 at 01:48:46PM +0800, Ruiling Song wrote: > > clang will align the vec3 load into vec4. we have to do it in frontend. > > > > Signed-off-by: Ruiling Song <[email protected]> > > --- > > backend/src/ocl_stdlib.tmpl.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/backend/src/ocl_stdlib.tmpl.h b/backend/src/ocl_stdlib.tmpl.h > > index 76395fa..e3ac632 100755 > > --- a/backend/src/ocl_stdlib.tmpl.h > > +++ b/backend/src/ocl_stdlib.tmpl.h > > @@ -3854,12 +3854,12 @@ INLINE_OVERLOADABLE void vstore3(TYPE##3 v, size_t > > offset, SPACE TYPE *p) {\ > > *(p + 3 * offset + 2) = v.s2; \ > > } \ > > INLINE_OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ > > - return *(SPACE TYPE##3 *) (p + 3 * offset); \ > > + return (TYPE##3)(*(p + 3 * offset), *(p+ 3 * offset + 1), *(p + 3 * > > offset + 2));\ > > } > > > > #define DECL_UNTYPED_RDV3_SPACE(TYPE, SPACE) \ > > INLINE_OVERLOADABLE TYPE##3 vload3(size_t offset, const SPACE TYPE *p) { \ > > - return *(SPACE TYPE##3 *) (p + 3 * offset); \ > > + return (TYPE##3)(*(p + 3 * offset), *(p+ 3 * offset + 1), *(p + 3 * > > offset + 2));\ > > } > > > > #define DECL_UNTYPED_RW_ALL_SPACE(TYPE, SPACE) \ > > -- > > 1.7.9.5 > > > > _______________________________________________ > > Beignet mailing list > > [email protected] > > http://lists.freedesktop.org/mailman/listinfo/beignet > _______________________________________________ > Beignet mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
