On Mar 21, 2015, at 12:55 PM, Vasco Alexandre da Silva Costa 
<vasco.co...@gmail.com> wrote:

> I looked some more at the code and yeah the weave booleans step is branch 
> heavy and has gotos in it. So this is most likely a bad candidate for GPU 
> acceleration since a GPU will have like a quarter of the clockspeed and worse 
> branch prediction hardware than the CPU. We want something that is compute 
> heavy and has low thread divergence.

I wouldn’t discredit it because of how it’s currently implemented.  Similar to 
the various research demonstrations years ago that showed how to encode a 
(branch-heavy) kd-tree efficiently on the GPU, it should be possible to do the 
same with Boolean weaving.  The algorithm is very similar to depth buffering 
and depth peeling.  Like I said, you could almost certainly land a Siggraph or 
JCGT paper on this particular topic.  It’s an under-explored area of research.

> So I think it is best to focus on ray dispatch+traversal at this time. 
> Generate rays in bundles, traverse some sort of acceleration structure, 
> gather hit candidate ids, and compute hit intersections on CPU. Not having 
> the primitive tests on the GPU will reduce the performance advantage since we 
> will spend time transmitting data back and forth though.

This also sounds reasonable to me.  This will almost certainly involve more 
modifications to our front-end application logic (and obviously replaced 
back-end traversal logic).  Considerably more code that you’ll have to become 
familiar with compared with Boolean weaving, but not nearly as complicated or 
validation-sensitive.

In the end, probably a tradeoff wash.  Less code, harder algo vs More code, 
easier algo.  Both highly valuable, so propose whatever piques your interest!

> The shading pass also seems like low hanging fruit. We need to send the image 
> data to the GPU to display anyway so why not let it do some of the work. But 
> this depends on how the code is organized and output is displayed.

For some shaders, this is certainly possible, but I’m not married to our shader 
system.  More inclined to leverage someone else’s rendering infrastructure 
(e.g., OSL and/or Appleseed) for anything other than simple phong/flat shading.

> The main issue is that we would like that the 
> shootOneRay->traverseScene->evalHits->weaveBooleans->colorizePixel steps 
> would be done in clearly separated stages for all tested rays, in order to 
> reduce the amount of CL kernel calls and maximize coherence, but from what I 
> understand everything currently happens in one giant megakernel. So perhaps a 
> lot of effort should be spent changing this way of things before doing any CL 
> coding at all.

Almost certainly.  There’s a front-end layer (rt) that dispatches to a back end 
layer (librt) for traversed intersections, then the front-end sends the results 
to another layer (liboptical) for coloring/effects, and that all is then 
composited by the front-end into an image (via libicv/libfb).  The application 
layer (used by dozens of applications) is somewhat relevant and described in 
detail here: http://brlcad.org/wiki/Developing_applications

Cheers!
Sean


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to