On 7 October 2013 00:36, brdavs <brd...@yahoo.com> wrote: > Hi Jon, > > Thank you very much for your kind reply.
You are welcome. I'm keeping the mypaint-discuss mailing list CC'ed so others can follow as well. > > I have several questions / comments: > 1. Why do you think that OpenCL is a better solution > than doing it purely in OpenGL? > > One problem I see is that if you are drawing a lot of > small dabs, OpenCL will probably be very inefficient, > because you won't be doing much work for each dab. > You'll be launching a kernel a lot which is not negligible. OpenCL is a more general programming model, hence it will probably be easier to translate existing code and concepts in a meaningful way. Also, I've programmed GPUs in this manner before (using CUDA) - where as I've never done non-trivial OpenGL. If one can make it work in OpenGL/GLSL, that would probably be ideal. Both from performance and availability perspective. Any ideas how to realize the algorithm there would be very welcomed! > > In addition to a surface backend, you will likely want a way to > display the surface on screen - so that is the other thing that needs > implementing. This could/should be OpenGL based, using the > OpenCL+OpenGL interoperability if the surf backend is in OpenCL. > > For making full advantage of such a backend in MyPaint itself, one > would also have to implement layers etc. on the GPU side. > > 2. Even if you can batch a bunch of dabs together, the dabs will overlap > and you would have to resort to atomic operations (slower) for proper > blending. The CPU backend has an operations queue in which dabs are batched, in a tile-wise manner. See operationsqueue.c and mypaint-tiledsurface.c No concurrency is attempted between individual dabs, ordering is preserved by the queue. Threads work on individual tiles, with no syncronization necessary between them. On the GPU one would ideally like that every (output) pixel is computed concurrently without sync. One could perhaps apply the same queuing principle, but probably use a vector for the depth - and store the queue suitable for consumption by individual thread warps (16/32 on GPUs I am used to). But, starting with the naive one-kernel-per-dab approach is probably the best. Once we have that we can think of smarter ways of doing things. > > 3. Is there a super simple working example of the brushlib > painting a (hardwired) stroke with a configurable brush? > > I would be interested in trying a few simple things in > OpenCL (no tiling, etc.) and would be good to have > a simple starting point in C/C++ and a reference > implementation. Sadly the "minimal.c" example is broken right now. The API usage as shown is fine, its just the simple surface that is borked - wrong buffer stride handling in the backend. I will have a look at fixing it up later, but I would not wait for it. For GPU things, you'll need a lot of additional scaffolding (OpenCL setup etc), so maybe copy minimal.c and start adding that. I suggest you put the code in a subdirectory as it will have extra dependencies, for instance "brushlib/cl/". > BlackInk seems to have a very impressive paint engine inplemented > on the GPU: > http://www.bleank.com/BlackInk-a115.html > > > Marko > -- Jon Nordby - www.jonnor.com _______________________________________________ Mypaint-discuss mailing list Mypaint-discuss@gna.org https://mail.gna.org/listinfo/mypaint-discuss