Ok. The boolean weaving seems more interesting. Besides being possible to
store similarly to a kd-tree with ropes, like you said, it's also possible
that we can do some sort of static optimization on the logic expressions
here. For example if we have an intersection operation and we miss one of
the primitives in the set we don't need to test the others. We probably
should also pick the smallest primitive to test on an intersection set
first because that increases the chances of getting to the final result
faster... I can see us doing static optimizations like that.

So I propose this timetable:
2 weeks designing algorithms for static and/or dynamic boolean weaving
optimizations, like the ones I mention above.
1 week designing a way to store the logic tree.
3 weeks related literature search (e.g. static compiler optimizations, RT
surface-area-heuristics papers, spatial tree storage on the GPU, etc) to
see if there is a way to improve even more on the proposed algorithms.
3 weeks coding a logic tree database (construction, query routines).
4 weeks coding static optimizations (e.g. annotating or rebalancing the
tree).

How does this sound?



On Mon, Mar 23, 2015 at 3:33 PM, Christopher Sean Morrison <brl...@mac.com>
wrote:

>
> On Mar 21, 2015, at 12:55 PM, Vasco Alexandre da Silva Costa <
> vasco.co...@gmail.com> wrote:
>
> > I looked some more at the code and yeah the weave booleans step is
> branch heavy and has gotos in it. So this is most likely a bad candidate
> for GPU acceleration since a GPU will have like a quarter of the clockspeed
> and worse branch prediction hardware than the CPU. We want something that
> is compute heavy and has low thread divergence.
>
> I wouldn’t discredit it because of how it’s currently implemented.
> Similar to the various research demonstrations years ago that showed how to
> encode a (branch-heavy) kd-tree efficiently on the GPU, it should be
> possible to do the same with Boolean weaving.  The algorithm is very
> similar to depth buffering and depth peeling.  Like I said, you could
> almost certainly land a Siggraph or JCGT paper on this particular topic.
> It’s an under-explored area of research.
>
> > So I think it is best to focus on ray dispatch+traversal at this time.
> Generate rays in bundles, traverse some sort of acceleration structure,
> gather hit candidate ids, and compute hit intersections on CPU. Not having
> the primitive tests on the GPU will reduce the performance advantage since
> we will spend time transmitting data back and forth though.
>
> This also sounds reasonable to me.  This will almost certainly involve
> more modifications to our front-end application logic (and obviously
> replaced back-end traversal logic).  Considerably more code that you’ll
> have to become familiar with compared with Boolean weaving, but not nearly
> as complicated or validation-sensitive.
>
> In the end, probably a tradeoff wash.  Less code, harder algo vs More
> code, easier algo.  Both highly valuable, so propose whatever piques your
> interest!
>
> > The shading pass also seems like low hanging fruit. We need to send the
> image data to the GPU to display anyway so why not let it do some of the
> work. But this depends on how the code is organized and output is displayed.
>
> For some shaders, this is certainly possible, but I’m not married to our
> shader system.  More inclined to leverage someone else’s rendering
> infrastructure (e.g., OSL and/or Appleseed) for anything other than simple
> phong/flat shading.
>
> > The main issue is that we would like that the
> shootOneRay->traverseScene->evalHits->weaveBooleans->colorizePixel steps
> would be done in clearly separated stages for all tested rays, in order to
> reduce the amount of CL kernel calls and maximize coherence, but from what
> I understand everything currently happens in one giant megakernel. So
> perhaps a lot of effort should be spent changing this way of things before
> doing any CL coding at all.
>
> Almost certainly.  There’s a front-end layer (rt) that dispatches to a
> back end layer (librt) for traversed intersections, then the front-end
> sends the results to another layer (liboptical) for coloring/effects, and
> that all is then composited by the front-end into an image (via
> libicv/libfb).  The application layer (used by dozens of applications) is
> somewhat relevant and described in detail here:
> http://brlcad.org/wiki/Developing_applications
>
> Cheers!
> Sean
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> BRL-CAD Developer mailing list
> brlcad-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/brlcad-devel
>



-- 
Vasco Alexandre da Silva Costa
PhD Student at Department of Information Systems and Computer Science
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to