https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120753
--- Comment #17 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- Ah i forgot that mapping macro... So that is the correct code mytensor t; t.data=(double*)omp_target_alloc(sizeof(double)*20,omp_get_default_device()); t.strides=(int*)omp_target_alloc(sizeof(int)*2,omp_get_default_device()); t.extents=(int*)omp_target_alloc(sizeof(int)*2,omp_get_default_device()); #pragma omp target enter data map(to:t) #pragma omp target teams distribute for(int i=1; i<20; i++) { t.data[i]=20; } omp_target_free(t.data,omp_get_default_device()); omp_target_free(t.strides,omp_get_default_device()); omp_target_free(t.extents,omp_get_default_device()); On my card, with shared memory, both works... unfortunately, the shared memory it would default to without mapping is very slow... With map to it looks correct. No downcopy of the struct.. And 6 allocations and 3 up copies... So, it seems i can now write more complex algorithms.... thanks to everyone helping out... Start Duration Name Result CorrID Pid Tid T-Pri Thread Name 0,336454s 121,532 ms cuCtxCreate_v2 0 8 2283 2283 0 OpenMP Initial Thread 0,45863s 2,020 ms cuLinkCreate_v2 0 24 2283 2283 0 OpenMP Initial Thread 0,497243s 1,488 ms cuLinkComplete 0 68 2283 2283 0 OpenMP Initial Thread 0,498731s 2,794 ms cuModuleLoadData 0 69 2283 2283 0 OpenMP Initial Thread 0,501526s 1,180 μs cuLinkDestroy 0 70 2283 2283 0 OpenMP Initial Thread 0,501892s 13,020 μs cuMemcpyHtoD_v2 0 82 2283 2283 0 OpenMP Initial Thread 0,501907s 75,841 μs cuMemAlloc_v2 0 84 2283 2283 0 OpenMP Initial Thread 0,501985s 3,240 μs cuMemAlloc_v2 0 86 2283 2283 0 OpenMP Initial Thread 0,501989s 2,370 μs cuMemAlloc_v2 0 88 2283 2283 0 OpenMP Initial Thread 0,501993s 2,260 μs cuMemAlloc_v2 0 90 2283 2283 0 OpenMP Initial Thread 0,501997s 6,010 μs cuMemcpyHtoD_v2 0 93 2283 2283 0 OpenMP Initial Thread 0,502006s 2,350 μs cuMemAlloc_v2 0 95 2283 2283 0 OpenMP Initial Thread 0,502009s 3,491 μs cuMemcpyHtoD_v2 0 98 2283 2283 0 OpenMP Initial Thread 0,502013s 63,210 μs cuMemAlloc_v2 0 99 2283 2283 0 OpenMP Initial Thread 0,502077s 111,922 μs cuLaunchKernel 0 100 2283 2283 0 OpenMP Initial Thread 0,50219s 9,120 μs cuCtxSynchronize 0 101 2283 2283 0 OpenMP Initial Thread 0,5022s 4,570 μs cuMemFree_v2 0 104 2283 2283 0 OpenMP Initial Thread 0,502206s 8,760 μs cuMemFree_v2 0 107 2283 2283 0 OpenMP Initial Thread 0,502216s 2,720 μs cuMemFree_v2 0 110 2283 2283 0 OpenMP Initial Thread 0,502219s 2,250 μs cuMemFree_v2 0 113 2283 2283 0 OpenMP Initial Thread 0,50232s 92,081 μs cuMemFree_v2 0 119 2283 2283 0 OpenMP Initial Thread 0,502413s 63,238 ms cuCtxDestroy_v2 0 120 2283 2283 0 OpenMP Initial Thread