Hello,
I came up with this tentative work plan for improving the OpenCL (CL) RT
code on BRL-CAD:
- Hook into shot and db load routines (hooks disabled by default) in order
to capture ellipsoid primitives and shots into the CL side. Get familiar
with these top level interfaces of the code. (2 weeks)
- CL megakernel for boolean operations. i need to check how involved this
will be to have a more accurate time estimate. Probably requires storing
some primitive hierarchy on CL. Integration with C++ side of things for
other primitives may be problematic. (2 weeks?)
- Implement regular grid spatial partitioning construction and traversal to
accelerate CL ellipsoid shots. Requires integrating CL ellipsoid BBox code
(2 weeks)
- Implement CL rectilinear grids [1] (improvement on the Mike Gigante
Nugrid currently used in BRL-CAD) spatial partitioning construction and
traversal. Should reuse most of the regular grid construction code but
requires some extra construction steps and has different traversal scheme
(2.5 weeks)
- Support multiple rays in batches to reduce amount of CL kernel calls. May
require getting familiar with different part of the codebase as this is at
a higher level interface. (2 weeks)
- Cleanups, bugfixes, final tests, docs. (2.5 weeks)
=== Notes
I am fairly sure on the time estimates I made for the CL side of things but
am unsure on the BRL-CAD integration. Perhaps you know better how involved
these tasks would be? Especially the CL boolean ops with mixed C++ and CL
primitive shots. I think it might be a hard nut to crack but must be done
eventually in order to reduce the amount of CL kernel call overhead. To do
that we must be able to process many rays in one CL kernel call. If this
needs to be split into more subtasks I may cut some of the grid work to
balance the work plan.
[1] http://web.ist.utl.pt/~vasco.costa/uploads/Main/crgrtfixed.pdf
All the best,
On Thu, Mar 5, 2015 at 2:27 PM, Vasco Alexandre da Silva Costa <
vasco.co...@gmail.com> wrote:
> On Thu, Mar 5, 2015 at 6:24 AM, Christopher Sean Morrison <brl...@mac.com>
> wrote:
>
>>
>> Astute observations. The OpenCL code was intentionally structured that
>> way, not at all for performance obviously, but so we could perform a 1-1
>> comparison to validate that the implementation behavior was correct. Our
>> librt ray tracing logic is integrated in a number of scientific analysis
>> applications, so we go to great lengths to ensure we consistently report
>> hits/misses, that the ray intersection implementation is correct especially
>> given inaccurate floating point math.
>>
>> Indeed, the first OpenCL translation of the ellipsoid logic — which
>> seemed 100% identical to the C code — was wrong. It resulted in different
>> calculations. The issues were sorted out a few days later, but this level
>> of rigor and validation is necessary. Of course, this is not the desired
>> end-state, but was done to easily test the implementation. The inefficient
>> buffer allocation overhead was also intentional (and also temporary) just
>> to get quick numbers on OpenCL’s overhead costs.
>>
>
> I see. But double precision floating point math is a lot slower than
> single precision on certain architectures like the lower end NVIDIA cards.
> e.g. the GeForce GTX 780 Ti has 5048 SP GFLOPS but only 210 DP GFLOPS. Even
> on the higher end NVIDIA cards the DP FLOPS are like a third of the SP
> FLOPS. The rendering system should be tunable for either speed or accuracy.
> That kind of complicates things. As a first approach we can work only on
> getting the double precision to work but the performance will suffer a
> great deal.
>
> --
> Vasco Alexandre da Silva Costa
> PhD Student at Department of Information Systems and Computer Science
> Instituto Superior Técnico/University of Lisbon, Portugal
>
--
Vasco Alexandre da Silva Costa
PhD Student at Department of Information Systems and Computer Science
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel