On Wed, Jun 21, 2017 at 1:12 PM, Marco Domingues <marcodomingue...@gmail.com
> wrote:
> I followed your suggestion and added a lighting model to the ansi c code
> that does the white shading :) And uploaded some more new images of the
> results and also added a table with the time comparison.
>
Right now you got rt_boolweave() and bool_eval() working. This won't be
complete until you implement rt_boolfinal(). So yes there are still some
limitations but those should be solved further along the shedule.
> Running the OCL code in my CPU is faster than running it on a Nvidia GTX
> 970, which isn't a great board to perform double-precision calculations. I
> have included those times for reference.
>
http://brlcad.org/wiki/User:Marco-domingues/GSoC17/Log#21_June
There are lots of optimizations which can be done after we get things
working:
- There is some CPU<-> GPU traffic I left over between, the count_hits()
and store_hits() kernels, to perform a prefix sum which should be done on
the GPU.
- The RPN tree I implemented for bool_eval() is like memory space optimal
and code minimal but not traversal steps optimal. It should be possible to
evaluate the expression in less steps with a different expression
representation.
- There's bitscanning going on in bool_eval() to check which objects are
intersected in the partitions which could probably be further optimized
with bit ops (e.g. with OpenCL clz and shifts).
- The data structure that is used to store the partitions is not optimal,
since we're basically using a dynamic array when we could be using a
dynamic list, given that it's possible to insert partitions in the middle
of the partition list. This should make a difference in scenes with a lot
of depth complexity (which is not the case in these tests).
Plus, like Marco said, the GTX 970 is gaming optimized, it just doesn't
have a lot of DP FLOPS.
GTX 970 has 190 DP FLOPS, compare that with the 3494 SP FLOPS it can
achieve. In comparison my old GTX TITAN card has 1300 DP FLOPS and ~4500 SP
FLOPS. A modern V100 accelerator can do like 7450 DP FLOPS. It also costs
an arm and a leg though. A workstation with 4x V100 accelerators costs $69k.
Regards,
--
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel