here I have a few math node example if you need some :) *
http://tinyurl.com/327hgjw*
2010/8/29 Vilem Novak pildano...@post.cz
Hello, maybe focusing on performance - heavier nodes would make sense?
Rather performance heavy in my experience can be quality blurs(especially
defocus), UV remap,
bilateral blur.
With these the advantages would be visible even with the bus problems.
With regards,
Vilem Novak
Původní zpráva
Od: Jeroen Bakker j.bak...@atmind.nl
Předmět: Re: [Bf-committers] Blender and OpenCL
Datum: 29.8.2010 11:19:15
Hi Lukas,
Your explanation is a good one. Didn't come up to write it down that way.
The issue with memory during compositing is the way the nodes-editor
works. When changing a node-value (like degree) only the rotate-node and
all dependent nodes are re-calculated. The input-image is not
re-calculated it is still in memory. This is a good optimization during
editing time you only need to reevaluate a part of the node-system, but
in complex node-systems I think this will not work for OpenCL due to the
needed memory.
I am looking for a situation what is good during editing (decrease the
feedback-time to the end-user) and rendering (overall performance of the
system). But haven't found a good solution.
At the moment I am evaluating 2 things:
a. per viewer and compositor node a opencl kernel/program will be
generated and executed.
b. per node a program and kernel is created. and evaluation is done as
the current situation.
A question back. Have you seen any speed-up? My system (three years old
dual core 2...@2000mhz laptop with 1...@400mhz nvidia cores and a bus of
800Mhz) was not able to see big differences. I think that a desktop
system with a faster Bus and more and powerful gpu cores would get much
better performance.
Regards,
Jeroen
On 08/28/2010 09:40 PM, Lukas Tönne wrote:
I have tried out your patch, nice work :)
Here are some more thoughts on how to process data in the node tree. I
hope i'm not getting too verbose or tell you guys obvious stuff ;)
Basically when talking about data in the tree i see two different
types of dependency:
1. Inter-node dependency (vertical):
A node can only be executed (be it for a single pixel or the whole
image) when all it's inputs are done. This dependency _always_ exists
in node trees to a certain degree.
2. Inter-element dependency (horizontal):
An element (pixel, sample, particle, vertex, etc.) depends on the
state of other elements (neighbouring pixels, particles in a certain
radius, connected vertices).
Vertical dependency does not depend on the tree type, but only on the
connectivity of the nodes (complexity of the tree). Here's a made-up
example with strong connectivity in the middle part:
http://www.pasteall.org/pic/5405
Horizontal (inter-element dependency) on the other hand chiefly
depends on the type of tree you're looking at:
* Shader- and texture trees have _no_ horizontal dependency at all,
the color of a material or texture sample does not depend on other
samples. This is why shader trees can be evaluated per sample and do
not need to store large amounts of data.
* Compositor tree are the other extreme: while some nodes, such as
Mix, operate per-pixel, others like Blur and Defocus heavily depend on
neighbouring or even _all_ other pixels of the input images
respectively.
* Particles are not as extreme as compo trees (less neighbours to take
into account), but they lack the inherent ordering of image pixels and
need kd trees for finding neighbours.
One relatively simple thing one could probably do to decrease memory
usage is removing data that is not needed any more (I am not sure if
the current compositors do something like this already, if so, just
skip this section). As soon as all nodes, which use a certain socket
for input, have been processed, that sockets data can be freed from
memory. This of course only works as long as connectivity is
relatively low and node relations are local. In the example above
the result of the Blur node would have to be kept in memory until all
the mix nodes are finished, whereas the initial renderlayer node could
free its buffer right after Blur is done. It might even be an option
to bite the bullet, if memory usage gets dangerously high, and discard
intermediate results used very late in the tree and recalculate them
later.
Another improvement i currently use in the simulation trees is
splitting the large data blocks into smaller parts (batches). This
has the advantage of making better use of available processing power,
especially when some nodes need significantly more time than others.
In the compositor nodes one thread processes the full image for one
node at a time, which can lead to