> On 25 Jul 2017, at 20:49, Vasco Alexandre da Silva Costa
> <vasco.co...@gmail.com> wrote:
>
> On Tue, Jul 25, 2017 at 7:15 PM, Marco Domingues <marcodomingue...@gmail.com
> <mailto:marcodomingue...@gmail.com>> wrote:
> I’ve just finished gathering the statistics on the scenes in the ‘share/db’
> directory. I will attach the pdf with the results.
>
> It is amazing how c) and d) are so much slower than b). It should have only
> been like 2x slower. I guess this is due to the larger working set in memory.
> With the list of segments spread over a large amount of memory the
> 'shade_segs' phase will have poor memory coherency. It is particularly bad in
> goliath.g which is the scene with most depth complexity.
>
> It does not make sense that g) is faster than f) though.
Yes I also found it strange, but when I run the code in the GPU with the
‘shade_segs’ kernel disabled the frame buffer gets filled with noise, so I
don’t know if this can be the cause. I can also repeat the tests for g) and f),
to make sure that I did it right.
>
> You should use more appropriate measures. i.e. 's' or 'ms' for each cell,
> depending on much time it takes, instead of fractions.
> Or MB vs KB, etc. Also use the same number format everywhere (e.g. %.2f) and
> use American number format for the fraction separator i.e. '.' vs ','.
>
> In Table 3 "Other metrics" what does xx/yy in the "Partitions" mean? Is this
> used vs allocated partitions? The amount of wasted memory seems particularly
> bad in truck.g if that is the case. I would be nice to reduce memory
> consumption with partitions further. Still, at least all these scenes would
> easily fit into the typical memory of a graphics card with under 512 MB total
> footprint.
I will format the tables properly, thanks for the advice! And yes, in table 3
xx/yy in the “Partitions” means used/allocated partitions.
>
> I couldn’t really figure out how to use the profiling tools you mentioned.
> Well, the AMD CodeXL only allowed me to use CPU Profiling Time-based Sampling
> with my hardware, but I couldn’t really understand the output from it. The
> other tools I had trouble installing/running with BRL-CAD, so I ended
> gathering the data with the output from the ‘rt’ command.
>
> We'll have to talk about this over Skype I guess. I'm going to be a bit busy
> the next couple of days though so perhaps we'll have to do it Friday or early
> next week. Still the statistics you gathered are enough to start optimizing
> the code.
Seems good! Perhaps we could do this early next week? I’m not sure if I will be
able to be in the computer on Friday afternoon if that is okay.
Meanwhile I will be working on optimizing the process of building the
regiontable and fixing the issue with the normals!
Cheers,
Marco
>
> Maybe its not very accurate, but I tried to compare the code with the
> different kernels enabled/disabled and it pretty much confirms the bottleneck
> in the rt_boolfinal kernel.
>
> Everywhere you see loops within loops, large branch heavy kernels, or memory
> walks with large strides, it is a good hint that code could be further
> optimized.
>
> --
> Vasco Alexandre da Silva Costa
> PhD in Computer Engineering (Computer Graphics)
> Instituto Superior Técnico/University of Lisbon, Portugal
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org!
> http://sdm.link/slashdot_______________________________________________
> BRL-CAD Developer mailing list
> brlcad-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/brlcad-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel