On 2018/12/6 2:16 AM, Maciej W. Rozycki wrote:
AFAIK, things like strlen are already available in iso_c_binding, in forms
like "C_strlen".
Can you check again if that 'openacc_c_string' module is really necessary?
  Any pointers please?

  I can't see `c_strlen' or any equivalent interface defined either in the
Fortran 2003 language standard or in GCC documentation, and neither `grep'
over the GCC tree shows anything relevant.  The `iso_c_binding' module
defines only a bunch of procedures according to said documentation.  The
`strlen' function provided here has been taken from one of our Fortran
test cases, which strongly indicates there's no such API already available
or whoever wrote the test case would have chosen to use it I suppose.

Okay I see. I think I mixed up the common convention with the actual interface
standard.

+           CUcontext new_ctx;
+
+           CUDA_CALL_ERET (propval, cuCtxCreate, &new_ctx, CU_CTX_SCHED_AUTO,
+                           dev);
+           CUDA_CALL_ERET (propval, cuMemGetInfo, &free_mem, &total_mem);
+           CUDA_CALL_ASSERT (cuCtxDestroy, new_ctx);
+         }
(I'm CCing Tom here, as he is maintainer for these parts)

As we discussed earlier on our internal list, I think properly using
GOMP_OFFLOAD_init_device
is the right way, instead of using the lower level CUDA context
create/destroy.

I did not mean for you to first init the device and then immediately destroy
it by
GOMP_OFFLOAD_fini_device, just to obtain the property, but for you to just
take the opportunity to initialize
it for use, and leave it there until program exit. That should save resources
overall.
(BTW, CUDA contexts should be quite expensive to create/destroy, using a
cuCtxCreate/Destroy pair is probably
almost as slow)
  I have argued that this looks like a corner-case use case to me, as
querying for the remaining (rather than total) memory available to a
device that hasn't been (yet) used looks like of hardly any use to me,
because obviously at such a stage no memory has been used.  The OpenACC
standard does require us to handle such a request somehow, with returning
0 being another option, however I thought we may well have a quick peek
without pulling in all the state.

  I guess I have no strong opinion either way and I can adapt accordingly.

  NB that would have to be `gomp_init_device' rather than
`GOMP_OFFLOAD_init_device' AFAICS.

You'll have to use GOMP_OFFLOAD_init_device, as you are inside the plugin, 
gomp_init_device()
should not be available.

However, looking into this further, the checking conventions of 
GOMP_OFFLOAD_init_device
will have to be slightly tweaked to accommodate possible further initing from 
libgomp proper,
so this may requirement a longer string of changes...I think it's not worth it, 
or can
be adjusted later. I now think your current approach with the CUDA contexts is 
okay.

I think the patch is okay, although still needs approval from Thomas and Tom to 
commit.

Thanks,
Chung-Lin

Reply via email to