On 25 Jul 2017, at 00:33, Vasco Alexandre da Silva Costa <vasco.co...@gmail.com> wrote:

On Sat, Jul 22, 2017 at 2:59 PM, Marco Domingues <marcodomingue...@gmail.com> wrote:
Well, the ‘rt -z1 -l5’ command takes 8,103 seconds for the havoc.g scene when rendering with the GPU, and this same scene renders in 0,558 seconds when using the ANSI C code. Despite that, when I comment the call to the ‘build_regiontable’ function, the OpenCl code only takes 0,14 seconds. So the process of building the regiontable, evaluating partitions and resolving overlaps is causing a major bottleneck for this scene.

The operators.g scene takes 0,027 seconds when using the ‘rt -z1 -l5’ command, and 0,054 when using the ‘rt’ command.

So I looked at your code and yes the major bottleneck seems to be the build_regiontable() function. The ANSI C code precomputes the list of regions each solid is in before starting the boolean evaluation, around the time it parses and optimizes the boolean tree, this is stored in stp->st_regions. It means this is precomputed once, instead of being computed for every partition like what you're doing on your code. This should be the main cause for the slowdown.

Hm I think I could implement something similar. In the prep function I could build a buffer with the list of regions for each primitive, and then use the 'seg->sti' to index the buffer with the regions involved. This should be similar to what I am already doing to iterate over the boolean trees.

I’ve just finished gathering the statistics on the scenes in the ‘share/db’ directory. I will attach the pdf with the results. 

I couldn’t really figure out how to use the profiling tools you mentioned. Well, the AMD CodeXL only allowed me to use CPU Profiling Time-based Sampling with my hardware, but I couldn’t really understand the output from it. The other tools I had trouble installing/running with BRL-CAD, so I ended  gathering the data with the output from the ‘rt’ command.

Maybe its not very accurate, but I tried to compare the code with the different kernels enabled/disabled and it pretty much confirms the bottleneck in the rt_boolfinal kernel.

Regarding your last email, yes seems like I forgot to include the inflip and outflip in the partitions, which I will fix ASAP. I will work on that and also on the regiontable bottleneck and see what I can get!

Thanks for the help!

Regards,
Marco


Regards,

--
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Attachment: OpenCL_code_profiling.pdf
Description: Adobe PDF document


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to