Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

Cesar Philippidis Thu, 26 Jul 2018 07:27:44 -0700

Hi Tom,

I see that you're reviewing the libgomp changes. Please disregard the
following hunk:

On 07/11/2018 12:13 PM, Cesar Philippidis wrote:
> @@ -1199,12 +1202,59 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>                            default_dims[GOMP_DIM_VECTOR]);
>       }
>        pthread_mutex_unlock (&ptx_dev_lock);
> +      int vectors = default_dims[GOMP_DIM_VECTOR];
> +      int workers = default_dims[GOMP_DIM_WORKER];
> +      int gangs = default_dims[GOMP_DIM_GANG];
> +
> +      if (nvptx_thread()->ptx_dev->driver_version > 6050)
> +     {
> +       int grids, blocks;
> +       CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
> +                         &blocks, function, NULL, 0,
> +                         dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
> +       GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
> +                          "grid = %d, block = %d\n", grids, blocks);
> +
> +       gangs = grids * dev_size;
> +       workers = blocks / vectors;
> +     }

I revisited this change yesterday and I noticed it was setting gangs
incorrectly. Basically, gangs should be set as follows

  gangs = grids * (blocks / warp_size);

or to be more closer to og8 as

  gangs = 2 * grids * (blocks / warp_size);

The use of that magic constant 2 is to prevent thread starvation. That's
a similar concept behind make -j<2*#threads>.

Anyway, I'm still experimenting with that change. There are still some
discrepancies between the way that I select num_workers and how the
driver does. The driver appears to be a little bit more conservative,
but according to the thread occupancy calculator, that should yield
greater performance on GPUs.

I just wanted to give you a heads up because you seem to be working on this.

Thanks for all of your reviews!

By the way, are you now maintainer of the libgomp nvptx plugin?

Cesar

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

Reply via email to