Hm I think I could implement something similar. In the prep function I could build a buffer with the list of regions for each primitive, and then use the 'seg->sti' to index the buffer with the regions involved. This should be similar to what I am already doing to iterate over the boolean trees. I’ve just finished gathering the statistics on the scenes in the ‘share/db’ directory. I will attach the pdf with the results. I couldn’t really figure out how to use the profiling tools you mentioned. Well, the AMD CodeXL only allowed me to use CPU Profiling Time-based Sampling with my hardware, but I couldn’t really understand the output from it. The other tools I had trouble installing/running with BRL-CAD, so I ended gathering the data with the output from the ‘rt’ command. Maybe its not very accurate, but I tried to compare the code with the different kernels enabled/disabled and it pretty much confirms the bottleneck in the rt_boolfinal kernel. Regarding your last email, yes seems like I forgot to include the inflip and outflip in the partitions, which I will fix ASAP. I will work on that and also on the regiontable bottleneck and see what I can get! Thanks for the help! Regards, Marco
|
OpenCL_code_profiling.pdf
Description: Adobe PDF document
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ BRL-CAD Developer mailing list brlcad-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/brlcad-devel