Re: [Beignet] [PATCH] Remove global offset need divide by local size restriction.

Sun, Yi Wed, 25 Sep 2013 23:19:42 -0700

Sorry for the late since I'm on travel.
But I can also trigger this bug ......


Thanks
  --Sun, Yi

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Xing, Homer
> Sent: Wednesday, September 25, 2013 10:34 PM
> To: Zhigang Gong; Yang, Rong R
> Cc: [email protected]
> Subject: Re: [Beignet] [PATCH] Remove global offset need divide by local size
> restriction.
> 
> I can take this bug. But currently I don't have enough information.
> 
> Zhigang, do you know more details about how to find dependency bug in code
> path?
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On
> Behalf Of Zhigang Gong
> Sent: Thursday, September 26, 2013 10:43 AM
> To: Yang, Rong R
> Cc: [email protected]
> Subject: Re: [Beignet] [PATCH] Remove global offset need divide by local size
> restriction.
> 
> Hi guys,
> 
> I just pushed this patch, and it causes a regression as below:
> 
> compiler_abs_diff_long16:
>   compiler_abs_diff_long16()    [FAILED]
>     Error: !memcmp(buf_data[2], cpu_diff, sizeof(T) * n)
>   at file /home/gongzg/git/fdo/beignet/utests/compiler_abs_diff.cpp, function
> compiler_abs_diff_with_type, line 177
> 
> I took a quick look at it and found the root cause of this regression is not 
> in this
> patch. As when I disable the OCL_PRE_ALLOC_INSN, it works fine.
> For me, the most possible root cause is there is a dependency bug within the
> abs_diff_long code path.
> 
> I will take vacation from tomorrow and will back to work at 8/10. If anybody 
> in
> the list can take this bug, I will appreciate. Thanks
> 
> - Zhigang
> 
> On Mon, Sep 23, 2013 at 02:04:08PM +0800, Yang Rong wrote:
> > Set to global offset to 0 in walker, and add global offset when 
> > get_global_id.
> >
> > Signed-off-by: Yang Rong <[email protected]>
> > ---
> >  backend/src/ocl_stdlib.tmpl.h | 2 +-
> >  src/cl_api.c                  | 7 -------
> >  src/intel/intel_gpgpu.c       | 6 +++---
> >  3 files changed, 4 insertions(+), 11 deletions(-)
> >
> > diff --git a/backend/src/ocl_stdlib.tmpl.h
> > b/backend/src/ocl_stdlib.tmpl.h index 9b76ba1..4921ee4 100644
> > --- a/backend/src/ocl_stdlib.tmpl.h
> > +++ b/backend/src/ocl_stdlib.tmpl.h
> > @@ -588,7 +588,7 @@ DECL_PUBLIC_WORK_ITEM_FN(get_num_groups, 1)
> #undef
> > DECL_PUBLIC_WORK_ITEM_FN
> >
> >  INLINE uint get_global_id(uint dim) {
> > -  return get_local_id(dim) + get_local_size(dim) * get_group_id(dim);
> > +  return get_local_id(dim) + get_local_size(dim) * get_group_id(dim)
> > + + get_global_offset(dim);
> >  }
> >
> >
> > //////////////////////////////////////////////////////////////////////
> > /////// diff --git a/src/cl_api.c b/src/cl_api.c index
> > b60160b..c19b80a 100644
> > --- a/src/cl_api.c
> > +++ b/src/cl_api.c
> > @@ -2264,19 +2264,12 @@ clEnqueueNDRangeKernel(cl_command_queue
> command_queue,
> >      goto error;
> >    }
> >
> > -  /* Check offset values. We add a non standard restriction. The offsets
> must
> > -   * also be evenly divided by the local sizes
> > -   */
> >    if (global_work_offset != NULL)
> >      for (i = 0; i < work_dim; ++i) {
> >        if (UNLIKELY(~0LL - global_work_offset[i] > global_work_size[i])) {
> >          err = CL_INVALID_GLOBAL_OFFSET;
> >          goto error;
> >        }
> > -      if (UNLIKELY(local_work_size != NULL && global_work_offset[i] %
> local_work_size[i])) {
> > -        err = CL_INVALID_GLOBAL_OFFSET;
> > -        goto error;
> > -      }
> >      }
> >
> >    /* Local sizes must be non-null and divide global sizes */ diff
> > --git a/src/intel/intel_gpgpu.c b/src/intel/intel_gpgpu.c index
> > 7b82b76..44f44ef 100644
> > --- a/src/intel/intel_gpgpu.c
> > +++ b/src/intel/intel_gpgpu.c
> > @@ -886,11 +886,11 @@ intel_gpgpu_walker(intel_gpgpu_t *gpgpu,
> >      OUT_BATCH(gpgpu->batch, (1 << 30) | (thread_n-1)); /* SIMD16 |
> thread max */
> >    else
> >      OUT_BATCH(gpgpu->batch, (0 << 30) | (thread_n-1)); /* SIMD8  |
> > thread max */
> > -  OUT_BATCH(gpgpu->batch, global_wk_off[0]);
> > +  OUT_BATCH(gpgpu->batch, 0);
> >    OUT_BATCH(gpgpu->batch, global_wk_dim[0]);
> > -  OUT_BATCH(gpgpu->batch, global_wk_off[1]);
> > +  OUT_BATCH(gpgpu->batch, 0);
> >    OUT_BATCH(gpgpu->batch, global_wk_dim[1]);
> > -  OUT_BATCH(gpgpu->batch, global_wk_off[2]);
> > +  OUT_BATCH(gpgpu->batch, 0);
> >    OUT_BATCH(gpgpu->batch, global_wk_dim[2]);
> >    OUT_BATCH(gpgpu->batch, right_mask);
> >    OUT_BATCH(gpgpu->batch, ~0x0);                     /* we
> always set height as 1, so set bottom mask as all 1*/
> > --
> > 1.8.1.2
> >
> > _______________________________________________
> > Beignet mailing list
> > [email protected]
> > http://lists.freedesktop.org/mailman/listinfo/beignet
> _______________________________________________
> Beignet mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/beignet
> _______________________________________________
> Beignet mailing list
> [email protected]
> http://lists.freedesktop.org/mailman/listinfo/beignet
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Re: [Beignet] [PATCH] Remove global offset need divide by local size restriction.

Reply via email to