Forgot to mention, please be noted that the L3 cache in the paper is LLC indeed but not the L3 cache I mentioned in my previous note.
On Wed, Dec 03, 2014 at 03:48:18PM +0800, spring_wind wrote: > Thanks for your reply. > > Now, I really want to know whether clflush instruction evicts all memory line > including the Last Level Cache(LLC), if so, hurt performance defenitly. > > Here is paper I searched which said clflush indeed did like that. > http://eprint.iacr.org/2013/448.pdf > > > > > > > > At 2014-12-03 13:57:36, "Zhigang Gong" <[email protected]> wrote: > >On Wed, Dec 03, 2014 at 01:46:51PM +0800, spring_wind wrote: > >> > >> > >> I noticed that in kernel's source(3.12.13) drm_clflush_page used clflush > >> instruction. > >> I wonder whether the clfush instruction only push data to LLC or RAM? If > >> it is RAM, is it needless since CPU and GPU share LLC. > > > >Right, they share the LLC, but doesn't share other level of caches, > >so when swtich from CPU domain to GPU domain, CPU needs to flush CPU's > >L1/L2 cache to LLC cache. When switch from GPU domain to CPU domain, > >GPU needs to flush GPU's L3 cache to LLC cache. > > > >Please be noted, the L3 cache I mentioned is in GPU domain. > >> > >> We are planning to use beignet to accelerate video analysis and image > >> processing. The data flow will involve input and output between GPU and > >> CPU, we expect it will be not a bottleneck. > >Although cache flush between L3 <--> LLC <--> L2/L1 is not as heavy > >as the flush between LLC and memory, it's still not free. Especially > >if your application need to do that domain switch freqently. > > > >> > >> > >> > >> At 2014-12-03 10:36:28, "Zhigang Gong" <[email protected]> > >> wrote: > >> >On Wed, Dec 03, 2014 at 10:44:26AM +0800, spring_wind wrote: > >> >> 1. clCreateBuffer(CL_MEM_USE_HOST_PTR) > >> >> 2. clMapxx-->change the data pointed by host ptr--->clUnmapxxx > >> >> > >> >> Is that right? > >> >The usual use scenario is as below: > >> > > >> >1. clCreateBuffer(CL_MEM_USE_HOST_PTR) > >> >2. clEnqueueNDRangeKernel() to enqueue some kernel to do some > >> > computation on GPU side and write to the above buffer. > >> >3. clEnqueueMapBuffer() the buffer, and do some computation > >> > on CPU side and modify the buffer content pointed by host_ptr. > >> >4. clEnqueueUnmapBuffer() to give the buffer back to GPU. > >> > > >> >The step 3 and 4 are synchronization points defined by OpenCL > >> >spec which are mandatory on all platforms to access a buffer > >> >pointed by host_ptr shared by GPU and CPU. At step3, not only > >> >a GPU cache flush (flush from GPU's L3 cache to CPU's LLC cache), > >> >but also need to wait for all the kernel enqueued in step 2 > >> >to be finished. > >> > > >> > > >> >> > >> >> I think clUnmapxx must contain cache flush operation although it may > >> >> not involve extra copy, correct me if am wrong. > >> > > >> >You are right. > >> >The cache flush is always a must when we switch between GPU > >> >domain and CPU domain. But you can see, the above flush is > >> >not so heavy, as it's not a flush between cache and slow > >> >memory. > >> > > >> >> I am very concerned about the beignet performance on our product. > >> > > >> >You are welcome to share your detail concerns with us if possible. > >> >And there are also some performance hint in the > >> >'docs/optimization-guide.mdwn'. > >> >We will continue to update that document. > >> > > >> >Thanks, > >> >Zhigang Gong. > >> > > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> 在 2014-12-03 10:14:10,"Guo, Yejun" <[email protected]> 写道: > >> >> > >> >> > >> >> Just need to invoke clMapxx function before you access the host ptr > >> >> from CPU side, and invoke clUnmapxx function after the CPU access. > >> >> There is no extra copy inside clMap/Unmapxx if all the conditions are > >> >> satisfied. You can refer to the code in runtime_use_host_ptr_buffer.cpp > >> >> under beignet/utests. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> From: spring_wind [mailto:[email protected]] > >> >> Sent: Wednesday, December 03, 2014 10:02 AM > >> >> To: Guo, Yejun > >> >> Cc:[email protected] > >> >> Subject: Re:RE: [Beignet] CL_MEM_USE_HOST_PTR involve extra copy? > >> >> > >> >> > >> >> > >> >> You mean if I use CL_MEM_USE_HOST_PTR and host ptr is page aligned, I > >> >> change the data pointed by host ptr from CPU side, I don't have to do > >> >> any flush opertaion or I should still call clMapxx function? > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> 在 2014-12-03 08:37:15,"Guo, Yejun" <[email protected]> 写道: > >> >> > >> >> > >> >> > >> >> Hi, > >> >> > >> >> > >> >> > >> >> please check the latest code of beignet, there is no copy needed > >> >> between CPU and GPU if the host_ptr provided by application is page > >> >> aligned, and the page align limitation is expected to be removed some > >> >> days later. You can also try CL_MEM_ALLOC_HOST_PTR to avoid the extra > >> >> copy without align limitation. > >> >> > >> >> > >> >> > >> >> Btw, this zero-copy is supported starting from linux kernel 3.16 and > >> >> libdrm 2.4.58. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> From: Beignet [mailto:[email protected]] On Behalf > >> >> Of spring_wind > >> >> Sent: Tuesday, December 02, 2014 8:38 PM > >> >> To:[email protected] > >> >> Subject: [Beignet] CL_MEM_USE_HOST_PTR involve extra copy? > >> >> > >> >> > >> >> > >> >> Hi: > >> >> > >> >> Intel opencl optimization guide said using CL_MEM_USE_HOST_PTR can > >> >> avoid extra copy between CPU and GPU, but I noticed that in beignet's > >> >> implementation source it was not like that: > >> >> > >> >> /* Copy the data if required */ > >> >> > >> >> if (flags & CL_MEM_COPY_HOST_PTR || flags & CL_MEM_USE_HOST_PTR) > >> >> > >> >> cl_buffer_subdata(mem->bo, 0, sz, data); > >> >> > >> >> > >> >> > >> >> Could someone give me an answer? > >> >> > >> >> > >> >> > >> >> > >> > > >> >> _______________________________________________ > >> >> Beignet mailing list > >> >> [email protected] > >> >> http://lists.freedesktop.org/mailman/listinfo/beignet > >> > > >> >_______________________________________________ > >> >Beignet mailing list > >> >[email protected] > >> >http://lists.freedesktop.org/mailman/listinfo/beignet > >_______________________________________________ > >Beignet mailing list > >[email protected] > >http://lists.freedesktop.org/mailman/listinfo/beignet > _______________________________________________ > Beignet mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
