https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753

--- Comment #17 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Ah i forgot that mapping macro...

So that is the correct code

  mytensor t;
   
t.data=(double*)omp_target_alloc(sizeof(double)*20,omp_get_default_device());
    t.strides=(int*)omp_target_alloc(sizeof(int)*2,omp_get_default_device());
    t.extents=(int*)omp_target_alloc(sizeof(int)*2,omp_get_default_device());


#pragma omp target enter data map(to:t)

    #pragma omp target teams distribute
    for(int i=1; i<20; i++)
    {
        t.data[i]=20;
    }

    omp_target_free(t.data,omp_get_default_device());
    omp_target_free(t.strides,omp_get_default_device());
    omp_target_free(t.extents,omp_get_default_device());

On my card, with shared memory, both works... unfortunately, the shared memory
it would default to without mapping is very slow...


With map to it looks correct. No downcopy of the struct..

And 6 allocations and 3 up copies... 

So, it seems i can now write more complex algorithms....

thanks to everyone helping out...

Start   Duration        Name    Result  CorrID  Pid     Tid     T-Pri   Thread
Name
0,336454s       121,532 ms      cuCtxCreate_v2  0       8       2283    2283   
0       OpenMP Initial Thread
0,45863s        2,020 ms        cuLinkCreate_v2 0       24      2283    2283   
0       OpenMP Initial Thread
0,497243s       1,488 ms        cuLinkComplete  0       68      2283    2283   
0       OpenMP Initial Thread
0,498731s       2,794 ms        cuModuleLoadData        0       69      2283   
2283    0       OpenMP Initial Thread
0,501526s       1,180 μs        cuLinkDestroy   0       70      2283    2283   
0       OpenMP Initial Thread
0,501892s       13,020 μs       cuMemcpyHtoD_v2 0       82      2283    2283   
0       OpenMP Initial Thread
0,501907s       75,841 μs       cuMemAlloc_v2   0       84      2283    2283   
0       OpenMP Initial Thread
0,501985s       3,240 μs        cuMemAlloc_v2   0       86      2283    2283   
0       OpenMP Initial Thread
0,501989s       2,370 μs        cuMemAlloc_v2   0       88      2283    2283   
0       OpenMP Initial Thread
0,501993s       2,260 μs        cuMemAlloc_v2   0       90      2283    2283   
0       OpenMP Initial Thread
0,501997s       6,010 μs        cuMemcpyHtoD_v2 0       93      2283    2283   
0       OpenMP Initial Thread
0,502006s       2,350 μs        cuMemAlloc_v2   0       95      2283    2283   
0       OpenMP Initial Thread
0,502009s       3,491 μs        cuMemcpyHtoD_v2 0       98      2283    2283   
0       OpenMP Initial Thread
0,502013s       63,210 μs       cuMemAlloc_v2   0       99      2283    2283   
0       OpenMP Initial Thread
0,502077s       111,922 μs      cuLaunchKernel  0       100     2283    2283   
0       OpenMP Initial Thread
0,50219s        9,120 μs        cuCtxSynchronize        0       101     2283   
2283    0       OpenMP Initial Thread
0,5022s 4,570 μs        cuMemFree_v2    0       104     2283    2283    0      
OpenMP Initial Thread
0,502206s       8,760 μs        cuMemFree_v2    0       107     2283    2283   
0       OpenMP Initial Thread
0,502216s       2,720 μs        cuMemFree_v2    0       110     2283    2283   
0       OpenMP Initial Thread
0,502219s       2,250 μs        cuMemFree_v2    0       113     2283    2283   
0       OpenMP Initial Thread
0,50232s        92,081 μs       cuMemFree_v2    0       119     2283    2283   
0       OpenMP Initial Thread
0,502413s       63,238 ms       cuCtxDestroy_v2 0       120     2283    2283   
0       OpenMP Initial Thread

Reply via email to