https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120814
--- Comment #3 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- I want to note that if one comments out //A.device_upload(true); //B.device_upload(true); //C.device_alloc(true); and // C.host_update(true); in bool matrix_multiply_dot( mdspan<T, CA>& A, mdspan<T, CB>& B, mdspan<T, CC>& C, bool on_gpu=false,bool default_device=true,int devicenum=0) and replaces these calls by device_datastruct_upload(dA,devicenum); device_datastruct_upload(dB,devicenum); device_datastruct_alloc(dC,devicenum); and host_datastruct_update(dC,devicenum); then, the loop of the matrix multiplication recognizes that dA,dB,dC have been offloaded and works fine on gpu. It just does not work with the member functions of A,B,C called //A.device_upload(true); //B.device_upload(true); //C.device_alloc(true); and // C.host_update(true); which, however, do nothing than set the default device number and then calling device_datastruct_upload(dA,devicenum); device_datastruct_upload(dB,devicenum); device_datastruct_alloc(dC,devicenum); and host_datastruct_update(dC,devicenum) with the reference of A.pdatastruct, which is the same as dA later and so on... So the #pragma omp target teams loop should definitely recognize by the adresses and in both cases that dA,dB,dC have been off-loaded by using omp_has_device_addr....