Hi, I have created an OpenCL implementation of the bokeh blur. I got a speedup (2 times faster) on my old hardware, but also some stability issues. I think that some issues concern my old NVidia + OpenCL drivers. But I would really like to know how other hardware setups will do.
I get a random out of resource issue, that I cannot influence by code and got a UI freeze for long calculations. the current patch can be found on http://sicg.atmind.nl/media/patches/patch-opencl-bokeh.txt Regards, Jeroen On 08/29/2010 01:10 PM, Vilem Novak wrote: > Hello, maybe focusing on performance - heavier nodes would make sense? > Rather performance heavy in my experience can be quality blurs(especially > defocus), UV remap, > bilateral blur. > With these the advantages would be visible even with the bus problems. > With regards, > Vilem Novak > > >> ------------ Původní zpráva ------------ >> Od: Jeroen Bakker<[email protected]> >> Předmět: Re: [Bf-committers] Blender and OpenCL >> Datum: 29.8.2010 11:19:15 >> ---------------------------------------- >> Hi Lukas, >> >> Your explanation is a good one. Didn't come up to write it down that way. >> The issue with memory during compositing is the way the nodes-editor >> works. When changing a node-value (like degree) only the rotate-node and >> all dependent nodes are re-calculated. The input-image is not >> re-calculated it is still in memory. This is a good optimization during >> editing time you only need to reevaluate a part of the node-system, but >> in complex node-systems I think this will not work for OpenCL due to the >> needed memory. >> >> I am looking for a situation what is good during editing (decrease the >> feedback-time to the end-user) and rendering (overall performance of the >> system). But haven't found a good solution. >> >> At the moment I am evaluating 2 things: >> a. per viewer and compositor node a opencl kernel/program will be >> generated and executed. >> b. per node a program and kernel is created. and evaluation is done as >> the current situation. >> >> A question back. Have you seen any speed-up? My system (three years old >> dual core 2...@2000mhz laptop with 1...@400mhz nvidia cores and a bus of >> 800Mhz) was not able to see big differences. I think that a desktop >> system with a faster Bus and more and powerful gpu cores would get much >> better performance. >> >> Regards, >> Jeroen >> >> On 08/28/2010 09:40 PM, Lukas Tönne wrote: >> >>> I have tried out your patch, nice work :) >>> >>> Here are some more thoughts on how to process data in the node tree. I >>> hope i'm not getting too verbose or tell you guys obvious stuff ;) >>> >>> Basically when talking about data in the tree i see two different >>> types of dependency: >>> 1. Inter-node dependency ("vertical"): >>> A node can only be executed (be it for a single pixel or the whole >>> image) when all it's inputs are done. This dependency _always_ exists >>> in node trees to a certain degree. >>> 2. Inter-element dependency ("horizontal"): >>> An element (pixel, sample, particle, vertex, etc.) depends on the >>> state of other elements (neighbouring pixels, particles in a certain >>> radius, connected vertices). >>> >>> Vertical dependency does not depend on the tree type, but only on the >>> connectivity of the nodes (complexity of the tree). Here's a made-up >>> example with strong connectivity in the middle part: >>> http://www.pasteall.org/pic/5405 >>> >>> Horizontal (inter-element dependency) on the other hand chiefly >>> depends on the type of tree you're looking at: >>> * Shader- and texture trees have _no_ horizontal dependency at all, >>> the color of a material or texture sample does not depend on other >>> samples. This is why shader trees can be evaluated per sample and do >>> not need to store large amounts of data. >>> * Compositor tree are the other extreme: while some nodes, such as >>> Mix, operate per-pixel, others like Blur and Defocus heavily depend on >>> neighbouring or even _all_ other pixels of the input images >>> respectively. >>> * Particles are not as extreme as compo trees (less neighbours to take >>> into account), but they lack the inherent ordering of image pixels and >>> need kd trees for finding neighbours. >>> >>> One relatively simple thing one could probably do to decrease memory >>> usage is removing data that is not needed any more (I am not sure if >>> the current compositors do something like this already, if so, just >>> skip this section). As soon as all nodes, which use a certain socket >>> for input, have been processed, that sockets data can be freed from >>> memory. This of course only works as long as connectivity is >>> relatively low and node relations are "local". In the example above >>> the result of the Blur node would have to be kept in memory until all >>> the mix nodes are finished, whereas the initial renderlayer node could >>> free its buffer right after Blur is done. It might even be an option >>> to bite the bullet, if memory usage gets dangerously high, and discard >>> intermediate results used very late in the tree and recalculate them >>> later. >>> >>> Another improvement i currently use in the simulation trees is >>> splitting the large data blocks into smaller parts ("batches"). This >>> has the advantage of making better use of available processing power, >>> especially when some nodes need significantly more time than others. >>> In the compositor nodes one thread processes the full image for one >>> node at a time, which can lead to threads idly waiting for the result >>> of one other (iirc Brecht recently coded internal multithreading for >>> the especially heavy Defocus node though). At the same time by staying >>> with one node for a range of elements instead of processing them >>> one-by-one avoids the overhead of switching between nodes. Afaik this >>> is basically the same concept as OpenCLs "work groups", have to read >>> up on that again. >>> >>> Cheers >>> Lukas >>> >>> On Tue, Aug 24, 2010 at 7:18 PM, Jeroen Bakker<[email protected]> wrote: >>> >>> >>>> Hi all >>>> >>>> I have been experimenting with OpenCL and are planning a basic framework >>>> to support it in Blender. >>>> >>>> main features are: >>>> * OpenCL is disabled by default, CPU fall-back must ALWAYS be >>>> available. OpenCL can be enabled with command-line parameter >>>> * Compiler directive to completely disable OpenCL in Blender. >>>> * Basic implementation to access and use GPU-devices >>>> * I am not targeting the blender-render, but other time-consuming >>>> processes (fluids, node systems etc) >>>> >>>> I think this matches the basic blender principles: >>>> * can work on standard home PC's >>>> * blender installation is unzipping an zip >>>> >>>> Are other people also busy with this subject? >>>> >>>> Best regards, >>>> Jeroen >>>> >>>> http://wiki.blender.org/index.php/User_talk:Jbakker >>>> _______________________________________________ >>>> Bf-committers mailing list >>>> [email protected] >>>> http://lists.blender.org/mailman/listinfo/bf-committers >>>> >>>> >>>> >>> _______________________________________________ >>> Bf-committers mailing list >>> [email protected] >>> http://lists.blender.org/mailman/listinfo/bf-committers >>> >>> >>> >> _______________________________________________ >> Bf-committers mailing list >> [email protected] >> http://lists.blender.org/mailman/listinfo/bf-committers >> >> >> >> > _______________________________________________ > Bf-committers mailing list > [email protected] > http://lists.blender.org/mailman/listinfo/bf-committers _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
