On Wed, Jun 21, 2017 at 4:28 PM, Vasco Alexandre da Silva Costa <
vasco.co...@gmail.com> wrote:

> On Wed, Jun 21, 2017 at 1:12 PM, Marco Domingues <
> marcodomingue...@gmail.com> wrote:
>
>> I followed your suggestion and added a lighting model to the ansi c code
>> that does the white shading :) And uploaded some more new images of the
>> results and also added a table with the time comparison.
>>
>
> Right now you got rt_boolweave() and bool_eval() working. This won't be
> complete until you implement rt_boolfinal(). So yes there are still some
> limitations but those should be solved further along the shedule.
>
>
>> Running the OCL code in my CPU is faster than running it on a Nvidia GTX
>> 970, which isn't a great board to perform double-precision calculations. I
>> have included those times for reference.
>>
>
> http://brlcad.org/wiki/User:Marco-domingues/GSoC17/Log#21_June
>
> There are lots of optimizations which can be done after we get things
> working:
> - There is some CPU<-> GPU traffic I left over between, the count_hits()
> and store_hits() kernels, to perform a prefix sum which should be done on
> the GPU.
> - The RPN tree I implemented for bool_eval() is like memory space optimal
> and code minimal but not traversal steps optimal. It should be possible to
> evaluate the expression in less steps with a different expression
> representation.
> - There's bitscanning going on in bool_eval() to check which objects are
> intersected in the partitions which could probably be further optimized
> with bit ops (e.g. with OpenCL clz and shifts).
> - The data structure that is used to store the partitions is not optimal,
> since we're basically using a dynamic array when we could be using a
> dynamic list, given that it's possible to insert partitions in the middle
> of the partition list. This should make a difference in scenes with a lot
> of depth complexity (which is not the case in these tests).
>
> Plus, like Marco said, the GTX 970 is gaming optimized, it just doesn't
> have a lot of DP FLOPS.
> GTX 970 has 190 DP FLOPS, compare that with the 3494 SP FLOPS it can
> achieve. In comparison my old GTX TITAN card has 1300 DP FLOPS and ~4500 SP
> FLOPS. A modern V100 accelerator can do like 7450 DP FLOPS. It also costs
> an arm and a leg though. A workstation with 4x V100 accelerators costs $69k.
>
>
PS: That's supposed to be GFLOPS. Not FLOPS. :-P

-- 
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to