Hi Vasco,

Thanks for help and for the explanation!

Regards,

> On 18 May 2017, at 00:42, Vasco Alexandre da Silva Costa 
> <vasco.co...@gmail.com> wrote:
> 
> On Wed, May 17, 2017 at 6:20 PM, Marco Domingues <marcodomingue...@gmail.com 
> <mailto:marcodomingue...@gmail.com>> wrote:
> Hi,
> 
> When looking at Boolean evaluation code i noticed the use of the struct 
> bu_bitv (struct bu_bitv *solidbits) to test if a given region is ready to be 
> evaluated, by checking if every solid in the region has been intersected 
> (function ‘bool_partition_eligible’).
> 
> >From my understanding, these tests are not necessary to be ported to OpenCl, 
> >because at the moment of boolean evaluation, every solid is guaranteed to be 
> >intersected, considering the sequence of OCL kernels:
> 
> count_hits()
> store_segs()
> weave_segs()*
> eval_partitions()*
> shade_segs()
> 
> *yet to be implemented
> 
> Is this correct? Or the struct bu_bitv is used for something else in the 
> background other than just checking if every solid has been intersected?
> 
> Right. If I remember correctly that bitvector is used to ensure that we do 
> not intersect each object more than once. It is basically a performance 
> optimization because, if the acceleration structure is a spatial partition 
> like a kd-tree or a grid, it is possible that you will compute multiple 
> intersections with the same object in the same ray otherwise. But the thing 
> is, I implemented a BVH object partition acceleration structure for the 
> OpenCL version of the raytracer code. In a BVH you will never compute an 
> intersection with any object more than once along the same ray. That's just 
> how it is. So the final list should have no duplicates already.
>  
> Hence we should not need the bitvector at all. That is why I used a BVH in 
> the first place. Having a per thread bitvector with one bit per primitive in 
> a GPU, with possibly thousands of threads in flight, would use a lot of cache 
> space which would make the intersection computations really slow. You want to 
> minimize the amount of temporaries per thread on a GPU so you can have more 
> threads in flight without waiting for accesses to DRAM which.are really 
> painfully slow in comparison to the cache.
> 
> Regards,
> 
> -- 
> Vasco Alexandre da Silva Costa
> PhD in Computer Engineering (Computer Graphics)
> Instituto Superior Técnico/University of Lisbon, Portugal
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>! 
> http://sdm.link/slashdot_______________________________________________ 
> <http://sdm.link/slashdot_______________________________________________>
> BRL-CAD Developer mailing list
> brlcad-devel@lists.sourceforge.net <mailto:brlcad-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/brlcad-devel 
> <https://lists.sourceforge.net/lists/listinfo/brlcad-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to