On Sat, Jan 21, 2017 at 03:50:43PM +0100, Thomas Schwinge wrote:
> > In order to configure gcc to load libcuda.so.1 dynamically,
> > one has to either configure it --without-cuda-driver, or without
> > --with-cuda-driver=/--with-cuda-driver-lib=/--with-cuda-driver-include=
> > options if cuda.h and -lcuda aren't found in the default locations.
> 
> Would be good to have that documented ;-) -- done.

Thanks.

> I (obviously) agree with your intended (?) "--without-cuda-driver"
> semantics, but I think a "--with-cuda-driver" option should actually mean
> that the system's/installed CUDA driver package *must* be used (and
> similar for other "--with-cuda-driver*" options); and I also added
> "--with-cuda-driver=check" to allow overriding earlier such options (that
> is, restore the default "check" behavior).
> 
> I say 'intended (?) "--without-cuda-driver" semantics', because with your
> current patch/code, if I got that right, if one specifies
> "--without-cuda-driver" but actually does have a CUDA driver system
> installation available, then the nvptx libgomp plugin will still link
> against that one, instead of "dlopen"ing it.  So I changed that
> accordingly.

Agreed.

> > +PLUGIN_NVPTX_DYNAMIC=0
> 
> I find the name "PLUGIN_NVPTX_DYNAMIC" a bit misleading, as this isn't
> about the nvptx plugin being "dynamic" but rather it's about its usage of
> the CUDA driver library.  Thus renamed to "CUDA_DRIVER_DYNAMIC".

Ack.

> > --- libgomp/plugin/plugin-nvptx.c.jj        2017-01-13 12:07:56.000000000 
> > +0100
> > +++ libgomp/plugin/plugin-nvptx.c   2017-01-13 18:00:39.693284346 +0100
> 
> > +/* -1 if init_cuda_lib has not been called yet, false
> > +   if it has been and failed, true if it has been and succeeded.  */
> > +static char cuda_lib_inited = -1;
> 
> Don't we actually have to worry here about multiple threads running into
> this in parallel -- thus need locking (or atomic accesses?) when
> accessing "cuda_lib_inited"?

I thought it is only accessed when a lock is held, but I could be wrong.
Also, please se my question about why we ever call cuInit in nvptx_init
(whether nvptx_get_num_devices doesn't have to be called first).

> > +/* Dynamically load the CUDA runtime library and initialize function
> 
> Not "CUDA runtime" but actually "CUDA driver" -- changed.

Ok.

> I'd like some GOMP_PLUGIN_debug output for this and the following "return
> false" cases -- added.

Ok.

> > --- libgomp/plugin/cuda/cuda.h.jj   2017-01-13 15:58:00.966544147 +0100
> > +++ libgomp/plugin/cuda/cuda.h      2017-01-13 17:02:47.355817896 +0100
> 
> > +#define CUDA_VERSION 8000
> 
> Does that make it compatible to CUDA 8.0 (and later) only?  (Not yet
> checked.)

The only reason for that is
#if CUDA_VERSION < 7000
  /* Specified in documentation and present in library from at least
     5.5.  Not declared in header file prior to 7.0.  */
  extern CUresult cuGetErrorString (CUresult, const char **);
#endif
I wanted to make it clear that cuGetErrorString prototype is provided.

I must say I don't know enough about ABI and API incompatibilities between
different CUDA versions, I presume functions with defines like:
#define cuLinkCreate cuLinkCreate_v2
at some point weren't using the _v2 suffixes, but have no idea if they had
different arguments or what.  Perhaps that would be supportable by having
some fallback if for those dlsym fails or something.

> @@ -48,26 +49,44 @@ AC_SUBST(CUDA_DRIVER_LIB)
>  CUDA_DRIVER_CPPFLAGS=
>  CUDA_DRIVER_LDFLAGS=
>  AC_ARG_WITH(cuda-driver,
> +     [AS_HELP_STRING([--without-cuda-driver],
> +             [do not use the system's CUDA driver package])])
> +AC_ARG_WITH(cuda-driver,
> +     [AS_HELP_STRING([--with-cuda-driver=check],
> +             [use the system's CUDA driver package, if usable [default]])])
> +AC_ARG_WITH(cuda-driver,
> +     [AS_HELP_STRING([--with-cuda-driver],
> +             [use the system's CUDA driver package])])
> +AC_ARG_WITH(cuda-driver,
>       [AS_HELP_STRING([--with-cuda-driver=PATH],
> -             [specify prefix directory for installed CUDA driver package.
> -              Equivalent to --with-cuda-driver-include=PATH/include
> -              plus --with-cuda-driver-lib=PATH/lib])])
> +             [use installed CUDA driver package, and specify prefix
> +             directory.  Equivalent to
> +             --with-cuda-driver-include=PATH/include plus
> +             --with-cuda-driver-lib=PATH/lib])],
> +     [],
> +     [with_cuda_driver=check])

I admit my autoconf knowledge is limited, but it looks certainly strange
to have several AC_ARG_WITH for the same option.  Shouldn't we use
one AC_ARG_WITH(cuda-driver,
with multiple AS_HELP_STRING inside of its second argument?

>  AC_ARG_WITH(cuda-driver-include,
>       [AS_HELP_STRING([--with-cuda-driver-include=PATH],
> -             [specify directory for installed CUDA driver include files])])
> +             [use installed CUDA driver package, and specify directory for
> +             include files])])
>  AC_ARG_WITH(cuda-driver-lib,
>       [AS_HELP_STRING([--with-cuda-driver-lib=PATH],
> -             [specify directory for the installed CUDA driver library])])
> +             [use installed CUDA driver package, and specify directory for
> +             libraries])])
>  case "x$with_cuda_driver" in
> -  x | xno) ;;
> -  *) CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> -     CUDA_DRIVER_LIB=$with_cuda_driver/lib
> -     ;;
> +  xcheck | xno | xyes)
> +    ;;
> +  *)
> +    CUDA_DRIVER_INCLUDE=$with_cuda_driver/include
> +    CUDA_DRIVER_LIB=$with_cuda_driver/lib
> +    ;;
>  esac
>  if test "x$with_cuda_driver_include" != x; then
> +  CUDA_DRIVER_DYNAMIC=0
>    CUDA_DRIVER_INCLUDE=$with_cuda_driver_include
>  fi
>  if test "x$with_cuda_driver_lib" != x; then
> +  CUDA_DRIVER_DYNAMIC=0
>    CUDA_DRIVER_LIB=$with_cuda_driver_lib
>  fi
>  if test "x$CUDA_DRIVER_INCLUDE" != x; then
> @@ -76,12 +95,22 @@ fi
>  if test "x$CUDA_DRIVER_LIB" != x; then
>    CUDA_DRIVER_LDFLAGS=-L$CUDA_DRIVER_LIB
>  fi
> +case "x$with_cuda_driver" in
> +  xcheck)
> +    CUDA_DRIVER_DYNAMIC=check
> +    ;;
> +  xno)
> +    CUDA_DRIVER_DYNAMIC=1
> +    ;;
> +  xyes | *)
> +    CUDA_DRIVER_DYNAMIC=0
> +    ;;
> +esac

Why two separate case constructs?  Can't you do what you do in the second
in the second instead of that
> +  xcheck | xno | xyes)
> +    ;;
and just add CUDA_DRIVER_DYNAMIC=0 also to the *) case?
> +     case $CUDA_DRIVER_DYNAMIC in
> +       1)
> +         PLUGIN_NVPTX=1
> +         ;;
> +       check | 0)

Wouldn't it be far simpler to just use
        PLUGIN_NVPTX=1
        if test $CUDA_DRIVER_DYNAMIC != 1; then

> +         # Determine whether the system's CUDA driver package is usable.
> +         PLUGIN_NVPTX=0
> +
> +         # Tentatively point to the system's CUDA driver package.
> +         PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
> +         PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
> +         PLUGIN_NVPTX_LIBS=-lcuda
> +
> +         PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
> +         CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
> +         PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
> +         LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
> +         PLUGIN_NVPTX_save_LIBS=$LIBS
> +         LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
> +         AC_LINK_IFELSE(
> +           [AC_LANG_PROGRAM(
> +             [#include "cuda.h"],
> +               [CUresult r = cuCtxPushCurrent (NULL);])],
> +           [PLUGIN_NVPTX=1])
> +         CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
> +         LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
> +         LIBS=$PLUGIN_NVPTX_save_LIBS

        fi

> +         ;;
> +       *)
> +         AC_MSG_ERROR([internal error])
> +         ;;
> +     esac

and drop the above?

> +     case $CUDA_DRIVER_DYNAMIC:$PLUGIN_NVPTX in
> +       check:0)
> +         CUDA_DRIVER_DYNAMIC=1
> +         PLUGIN_NVPTX=1
> +         ;;
> +       check:1)
> +         CUDA_DRIVER_DYNAMIC=0
> +         ;;
> +       0:1 | 1:1)
> +         ;;
> +       0:0)
> +         AC_MSG_ERROR([CUDA driver package not usable])
> +         ;;
> +       *)
> +         AC_MSG_ERROR([internal error])
> +         ;;
>       esac

This is fine.

> +     if test $CUDA_DRIVER_DYNAMIC = 1; then

Here you use very similar test rather than case (just an argument
why IMHO case is unnecessary).  But not a big deal for me.

> +  const char *cuda_driver_lib = "libcuda.so.1";
> +  void *h = dlopen (cuda_driver_lib, RTLD_LAZY);

Note the HSAIL plugin uses secure_getenv and allows an env var to override
the location of the runtime library.  Dunno if we don't want to do that too,
but in any case, it can be done incrementally.

Otherwise LGTM (and thanks for testing it and patch).

        Jakub

Reply via email to