You mentioned that:
"There is some OpenCL prototype code in there for testing intersections with
ellipsoids. But this kind of architecture is not going to work performance
wise. It is allocating buffers and sending geometry data to the device (GPU or
whatever) on every single ray/primitive intersection call
(src/librt/primitives/sph/sph.c:clt_shot). I also checked the main intersection
code and it uses a function table to figure out which ray/primitive
intersection routine to call (src/librt/primitives/table.c)"
Since this subject talks about "Performance" I was going to ask:
1. First do you have/carry BrlCad with Manual Guide or a Guide book?
2. Is the main ray tracing written from OpenCL library/code? And does this
affects the following files:
src/other/opennurbs
src/librt/primitives/brep
src/librt/opennurbs_ext.*
src/proc-db/csgbrep.cpp
src/proc-db/brep*
3. Can we Modify/Edit these libraries such as: OpenCL, Opennurbs and Rhino3D
folks or should we write a ray tracer from scratch, let say if we need to
optimize it?
4. Is NURBS used to find ray intersection on objects? and what about OpenCL
Code does it also do the same thing as NURBS? which one is better?
I am interested in working on the "NURBS Optimization and Cleanup" and I would
like to optimize its algorithm as this often plays a pivotal role in improving
performance. This is just one classic place to optimize.
To optimize ray tracer, I am planning on conducting a couple of these steps:
1. Implementing Kd-tree structure.
2. Implementing Bezier clipping method for NURBS Surface
3. Implementing BVH Ray traversal
4. Implementing Bounding Volume Hierarchy
5. Fine-tuning a NURBS-curve by specifying different weight
with knot-vector
Bounding Volume Hierarchy:
In order to reduce intersection cost, a bounding volume should be drawn around
each polygon.
You also mentioned that:
"The way ray tracing currently occurs, there’s even room to speed up primitives
on the host (CPU) by an order of magnitude or better ensuring better data
coherency and eliminating branch logic."Because you mentioned it, I would like
to look further in this area.
From,
Benson
On Wednesday, March 4, 2015 11:25 PM, Christopher Sean Morrison
<brl...@mac.com> wrote:
On Mar 4, 2015, at 6:06 PM, Vasco Alexandre da Silva Costa
<vasco.co...@gmail.com> wrote:
> I have been reading the 'librt' source code.
Excellent!
> There is some OpenCL prototype code in there for testing intersections with
> ellipsoids. But this kind of architecture is not going to work performance
> wise. It is allocating buffers and sending geometry data to the device (GPU
> or whatever) on every single ray/primitive intersection call
> (src/librt/primitives/sph/sph.c:clt_shot). I also checked the main
> intersection code and it uses a function table to figure out which
> ray/primitive intersection routine to call (src/librt/primitives/table.c)…
Astute observations. The OpenCL code was intentionally structured that way,
not at all for performance obviously, but so we could perform a 1-1 comparison
to validate that the implementation behavior was correct. Our librt ray
tracing logic is integrated in a number of scientific analysis applications, so
we go to great lengths to ensure we consistently report hits/misses, that the
ray intersection implementation is correct especially given inaccurate floating
point math.
Indeed, the first OpenCL translation of the ellipsoid logic — which seemed 100%
identical to the C code — was wrong. It resulted in different calculations.
The issues were sorted out a few days later, but this level of rigor and
validation is necessary. Of course, this is not the desired end-state, but was
done to easily test the implementation. The inefficient buffer allocation
overhead was also intentional (and also temporary) just to get quick numbers on
OpenCL’s overhead costs.
Basically, it was a simple proof-of-concept and it was successful. The next
steps are to create a new intersection pipeline that is coherent from start to
end with bundles of rays getting dispatched and evaluated against primitives
without excessive branching and cache misses.
> Then I did some 'wc -l' runs to see how much effort it would take to recode
> parts of the pipeline:
> src/librt/*.c - 33516 lines
> src/librt/primitives/*.c - 4653 lines
> src/librt/primitives/*/*.c - 141351 lines
This is all the back-end portions of the pipeline. You’ll also need to
consider the front-end where ray dispatching occurs in
src/rt/(do|main|opt|view|worker).c
That said, it’s both better and worse than those numbers make it seem. You’ve
encompassed far too much code in librt, much of it related to .g file
management. You’re also not looking at the code that some of those primitives
rely on. For example, the “brep” primitive is built on src/libbrep and
src/other/openNURBS, and some of the primitives rely on our src/libbn math
library extensively for polynomial root solving.
> Most code is ray/primitive intersection routines. I have to read some more to
> see how the librt object database is stored, updated, and accessed. But I
> think the most viable architecture is probably going to be something like...
> we store some accelerated primitives on the device (GPU), and unaccelerated
> primitives on the host (CPU). Then when we do the ray shooting we perform it
> on both databases and merge the results. In theory the more accelerated
> primitives (GPU) we have the smaller the unaccelerated primitives database
> (CPU) will be and the faster the ray shooting will be. As a proof of concept
> we can reuse the accelerated ellipsoid intersection code over this
> architecture.
The way ray tracing currently occurs, there’s even room to speed up primitives
on the host (CPU) by an order of magnitude or better ensuring better data
coherency and eliminating branch logic. This will require modifying the
dispatcher (src/rt/worker.c), the shot routine (src/librt/primitives/*/*.c),
the boolean evaluator (src/librt/bool.c), and the optical renderer
(src/rt/view.c and src/rt/do.c).
On a minor related note, the implementation should not ever assume or require
the availability of a GPU. It can and should leverage GPU cores when they are
available. Again, there are substantial performance gains to be had even on
the CPU.
> We need to perform some kind of analysis on which primitives people are most
> likely to use and work on accelerating them. So basically we need some
> statistics on a corpus of scenes i.e. how many primitives each model total
> has and how many primitives of each type each scene has as a percentage.
Also, while we have 24+ different primitives and intersection routines, the
most important are just these six: ell, tgc, arb8, tor, bot, and brep. For
GSoC, I would recommend focusing on only the first four or even just ell since
it’s shot routine is already complete and validated, so you could focus on
getting the rest of the pipeline working.
> I haven't checked if this is implemented yet but it would be nice to have
> some built-in statistics like framerate or geometrical complexity that could
> be displayed on user demand via the UI.
There are already built-in statistics that get calculated during rendering for
rays/sec, prep overhead, and much more. The interactive raytracing interface
we have for bot (triangle mesh) geometry, called isst, provides a UI for
on-demand fps and complexity information.
Cheers!
Sean
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel