On Mar 4, 2015, at 6:06 PM, Vasco Alexandre da Silva Costa 
<vasco.co...@gmail.com> wrote:

> I have been reading the 'librt' source code.

Excellent!

> There is some OpenCL prototype code in there for testing intersections with 
> ellipsoids. But this kind of architecture is not going to work performance 
> wise. It is allocating buffers and sending geometry data to the device (GPU 
> or whatever) on every single ray/primitive intersection call 
> (src/librt/primitives/sph/sph.c:clt_shot). I also checked the main 
> intersection code and it uses a function table to figure out which 
> ray/primitive intersection routine to call (src/librt/primitives/table.c)…

Astute observations.  The OpenCL code was intentionally structured that way, 
not at all for performance obviously, but so we could perform a 1-1 comparison 
to validate that the implementation behavior was correct.  Our librt ray 
tracing logic is integrated in a number of scientific analysis applications, so 
we go to great lengths to ensure we consistently report hits/misses, that the 
ray intersection implementation is correct especially given inaccurate floating 
point math.

Indeed, the first OpenCL translation of the ellipsoid logic — which seemed 100% 
identical to the C code — was wrong.  It resulted in different calculations.  
The issues were sorted out a few days later, but this level of rigor and 
validation is necessary.  Of course, this is not the desired end-state, but was 
done to easily test the implementation.  The inefficient buffer allocation 
overhead was also intentional (and also temporary) just to get quick numbers on 
OpenCL’s overhead costs.

Basically, it was a simple proof-of-concept and it was successful.  The next 
steps are to create a new intersection pipeline that is coherent from start to 
end with bundles of rays getting dispatched and evaluated against primitives 
without excessive branching and cache misses.

> Then I did some  'wc -l' runs to see how much effort it would take to recode 
> parts of the pipeline:
> src/librt/*.c                          - 33516 lines
> src/librt/primitives/*.c           - 4653 lines
> src/librt/primitives/*/*.c         - 141351 lines

This is all the back-end portions of the pipeline.  You’ll also need to 
consider the front-end where ray dispatching occurs in 
src/rt/(do|main|opt|view|worker).c

That said, it’s both better and worse than those numbers make it seem.  You’ve 
encompassed far too much code in librt, much of it related to .g file 
management.  You’re also not looking at the code that some of those primitives 
rely on.  For example, the “brep” primitive is built on src/libbrep and 
src/other/openNURBS, and some of the primitives rely on our src/libbn math 
library extensively for polynomial root solving.

> Most code is ray/primitive intersection routines. I have to read some more to 
> see how the librt object database is stored, updated, and accessed. But I 
> think the most viable architecture is probably going to be something like... 
> we store some accelerated primitives on the device (GPU), and unaccelerated 
> primitives on the host (CPU). Then when we do the ray shooting we perform it 
> on both databases and merge the results. In theory the more accelerated 
> primitives (GPU) we have the smaller the unaccelerated primitives database 
> (CPU) will be and the faster the ray shooting will be. As a proof of concept 
> we can reuse the accelerated ellipsoid intersection code over this 
> architecture.

The way ray tracing currently occurs, there’s even room to speed up primitives 
on the host (CPU) by an order of magnitude or better ensuring better data 
coherency and eliminating branch logic.  This will require modifying the 
dispatcher (src/rt/worker.c), the shot routine (src/librt/primitives/*/*.c), 
the boolean evaluator (src/librt/bool.c), and the optical renderer 
(src/rt/view.c and src/rt/do.c).

On a minor related note, the implementation should not ever assume or require 
the availability of a GPU.  It can and should leverage GPU cores when they are 
available.  Again, there are substantial performance gains to be had even on 
the CPU.

> We need to perform some kind of analysis on which primitives people are most 
> likely to use and work on accelerating them. So basically we need some 
> statistics on a corpus of scenes i.e. how many primitives each model total 
> has and how many primitives of each type each scene has as a percentage.

Also, while we have 24+ different primitives and intersection routines, the 
most important are just these six: ell, tgc, arb8, tor, bot, and brep.  For 
GSoC, I would recommend focusing on only the first four or even just ell since 
it’s shot routine is already complete and validated, so you could focus on 
getting the rest of the pipeline working.

> I haven't checked if this is implemented yet but it would be nice to have 
> some built-in statistics like framerate or geometrical complexity that could 
> be displayed on user demand via the UI.

There are already built-in statistics that get calculated during rendering for 
rays/sec, prep overhead, and much more.  The interactive raytracing interface 
we have for bot (triangle mesh) geometry, called isst, provides a UI for 
on-demand fps and complexity information.

Cheers!
Sean


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to