On Sat, Mar 21, 2015 at 12:30 AM, Christopher Sean Morrison <brl...@mac.com>
...

> Something to consider, you could even propose *only* focusing on this
> aspect for GSoC.  What I mean is that you could spend time restructuring
> the rt dispatch logic, which is something like
> forAllPixels->shootOneRay->traverseScene->evalHits->weaveBooleans->colorizePixel
> that iterates over all pixels depth-first pipeline style.  You’d
> restructure it into something like phase1:
> forAllPixels->shootOneRay->traverseScene->evalHits then phase2:
> CLweaveAllBooleans->CLcolorizePixels.
>
...

> That’s why I’d suggest either focusing only on boolean weaving, or on
> bundled ray dispatch+traversal, or hit+result gathering, etc — something
> that could be put into immediate use, even if it’s not going to give the
> 10x speedup until the rest of the pipeline is converted.  Changing
> BRL-CAD’s render pipeline to support this style of evaluation is going to
> be a lot of work.
>

I looked some more at the code and yeah the weave booleans step is branch
heavy and has gotos in it. So this is most likely a bad candidate for GPU
acceleration since a GPU will have like a quarter of the clockspeed and
worse branch prediction hardware than the CPU. We want something that is
compute heavy and has low thread divergence.

So I think it is best to focus on ray dispatch+traversal at this time.
Generate rays in bundles, traverse some sort of acceleration structure,
gather hit candidate ids, and compute hit intersections on CPU. Not having
the primitive tests on the GPU will reduce the performance advantage since
we will spend time transmitting data back and forth though.

On Sat, Mar 21, 2015 at 7:19 AM, Christopher Sean Morrison <brl...@mac.com>
wrote:

> As for performance, weaving can be a big or small factor, as it’s heavily
> dependent on the geometry being rendered, number of primitives, number of
> regions, depth of hierarchy, number of operations, etc.  A traditional
> model like havoc.g or m35.g in our sample db directory should benefit
> greatly while a pure NURBS model (e.g., no Boolean operations) won’t
> benefit nearly as much.  We’re not going to see huge performance until the
> whole pipeline is connected, and that’s okay if users get speedups for some
> geometry situations.  Generally speaking, dispatched traversal is roughly
> usually 5-15%, prep is 0-20% (except brep), shot intersection testing is
> 30-70%, weaving is 5-30%, and shading is 0-25%.  Models with complex
> entities spend more time in prep and shot; models with complex hierarchies
> spend more time in weaving and traversal.
>

The shading pass also seems like low hanging fruit. We need to send the
image data to the GPU to display anyway so why not let it do some of the
work. But this depends on how the code is organized and output is displayed.

The main issue is that we would like that the
shootOneRay->traverseScene->evalHits->weaveBooleans->colorizePixel
steps would be done in clearly separated stages for all tested rays, in
order to reduce the amount of CL kernel calls and maximize coherence, but
from what I understand everything currently happens in one giant
megakernel. So perhaps a lot of effort should be spent changing this way of
things before doing any CL coding at all.

-- 
Vasco Alexandre da Silva Costa
PhD Student at Department of Information Systems and Computer Science
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to