From: Beignet [mailto:[email protected]] On Behalf Of 
Dehuan Xin
Sent: Sunday, November 15, 2015 5:50 PM
To: [email protected]
Subject: [Beignet] Question on some implementation details: async copy and 
vector functions

Hi,

I have two questions on the implementation detail of Beignet:

1. I see in this file that async copy is implemented as a for-loop. Does this 
mean async copy is currently a processor copy and is not offloaded to hardware 
like DMA and it's not truly `async`, either?
http://cgit.freedesktop.org/beignet/tree/backend/src/libocl/src/ocl_async.cl
async_copy is done through GPU I/O read/write messages. AFAIK, this is not DMA 
like transfer.
2. In this source file, I see vector load is scalarized to scalar load, is this 
behavior disabled when `__attribute__((vec_type_hint(<typen>)))` is used?
http://cgit.freedesktop.org/beignet/tree/backend/src/libocl/src/ocl_vload.cl
generally vload is not recommended. The problem is take vload4(uchar *) as an 
example, the pointer passed in is only guaranteed to be 1 aligned.
But a uchar4* read access will ensure the pointer is 4 aligned. That is a big 
difference to compiler. The hardware performance is quite different.
So, the guide is try to use uchar4* pointer which cl programmer need to make 
sure the pointer is 4 aligned. If you cannot guarantee the pointer is 4 aligned,
you can choose to use vload which is easy to program and . We will re-merge the 
scalarized load in compiler backend. Don’t worry. See llvm_loadstore_merge.cpp.
There is no performance difference between vload4(int/float*, ) and int/float4* 
read. Because they are both 4 aligned

We haven’t done anything special to `__attribute__((vec_type_hint(<typen>)))`.

Thanks!
Ruiling


Regards
Dehuan
_______________________________________________
Beignet mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to