Hi Vasco,
Thanks for help and for the explanation!
Regards,
> On 18 May 2017, at 00:42, Vasco Alexandre da Silva Costa
> <vasco.co...@gmail.com> wrote:
>
> On Wed, May 17, 2017 at 6:20 PM, Marco Domingues <marcodomingue...@gmail.com
> <mailto:marcodomingue...@gmail.com>> wrote:
> Hi,
>
> When looking at Boolean evaluation code i noticed the use of the struct
> bu_bitv (struct bu_bitv *solidbits) to test if a given region is ready to be
> evaluated, by checking if every solid in the region has been intersected
> (function ‘bool_partition_eligible’).
>
> >From my understanding, these tests are not necessary to be ported to OpenCl,
> >because at the moment of boolean evaluation, every solid is guaranteed to be
> >intersected, considering the sequence of OCL kernels:
>
> count_hits()
> store_segs()
> weave_segs()*
> eval_partitions()*
> shade_segs()
>
> *yet to be implemented
>
> Is this correct? Or the struct bu_bitv is used for something else in the
> background other than just checking if every solid has been intersected?
>
> Right. If I remember correctly that bitvector is used to ensure that we do
> not intersect each object more than once. It is basically a performance
> optimization because, if the acceleration structure is a spatial partition
> like a kd-tree or a grid, it is possible that you will compute multiple
> intersections with the same object in the same ray otherwise. But the thing
> is, I implemented a BVH object partition acceleration structure for the
> OpenCL version of the raytracer code. In a BVH you will never compute an
> intersection with any object more than once along the same ray. That's just
> how it is. So the final list should have no duplicates already.
>
> Hence we should not need the bitvector at all. That is why I used a BVH in
> the first place. Having a per thread bitvector with one bit per primitive in
> a GPU, with possibly thousands of threads in flight, would use a lot of cache
> space which would make the intersection computations really slow. You want to
> minimize the amount of temporaries per thread on a GPU so you can have more
> threads in flight without waiting for accesses to DRAM which.are really
> painfully slow in comparison to the cache.
>
> Regards,
>
> --
> Vasco Alexandre da Silva Costa
> PhD in Computer Engineering (Computer Graphics)
> Instituto Superior Técnico/University of Lisbon, Portugal
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>!
> http://sdm.link/slashdot_______________________________________________
> <http://sdm.link/slashdot_______________________________________________>
> BRL-CAD Developer mailing list
> brlcad-devel@lists.sourceforge.net <mailto:brlcad-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/brlcad-devel
> <https://lists.sourceforge.net/lists/listinfo/brlcad-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel