https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120814

--- Comment #3 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
I want to note that if one comments out



//A.device_upload(true);
//B.device_upload(true);
//C.device_alloc(true);



and

// C.host_update(true);



in 
bool matrix_multiply_dot( mdspan<T, CA>& A,   mdspan<T, CB>& B, mdspan<T, CC>&
C, bool on_gpu=false,bool default_device=true,int devicenum=0)


and replaces these calls by

    device_datastruct_upload(dA,devicenum);
    device_datastruct_upload(dB,devicenum);
    device_datastruct_alloc(dC,devicenum);


and 
   host_datastruct_update(dC,devicenum);

then, the loop of the matrix multiplication recognizes that dA,dB,dC have been
offloaded and works fine on gpu. 

It just does not work with the member functions of A,B,C called
//A.device_upload(true);
//B.device_upload(true);
//C.device_alloc(true);
and
// C.host_update(true);

which, however, do nothing than set the default device number and then calling

device_datastruct_upload(dA,devicenum);
    device_datastruct_upload(dB,devicenum);
    device_datastruct_alloc(dC,devicenum);


and 
   host_datastruct_update(dC,devicenum)

with the reference of A.pdatastruct, which is the same as dA later and so on...

So the 

#pragma omp target teams loop should definitely recognize by the adresses and
in both cases that dA,dB,dC have been off-loaded by using
omp_has_device_addr....

Reply via email to