On Sun, Jul 30, 2017 at 11:20 PM, Christopher Sean Morrison <brl...@mac.com>
wrote:
> That screenshot's timings clearly have some issue(s) that give pause. At
> a glance, I’m not sure what those numbers are saying other than there’s a
> huge chunk of time introduced somewhere outside the ray intersection
> testing.
>
> If the 0.55s RTFM number can be trusted and the rays/sec on that line is 0
> simply because it’s not yet tracking the number of rays from the OpenCL
> side, then that SHOULD represent time spent in ray traversal, intersection
> testing, boolean evaluation, and shading. That is, it’s the most important
> number for your project.
>
Yes the OpenCL side uses a totally different main loop so it basically
doesn't track the number of rays fired. We use a different main loop
because the pipeline is setup differently, in stages, to reduce thread
divergence.
> What is really curious is the "SHOT:" line that says it spent 2.18s of cpu
> time and 2.54s elapsed. The elapsed time SHOULD be nearly identical to the
> RTFM number, at least for sunk data. This discrepancy could be OpenGL
> inefficiency.
>
> Add -o /dev/null to sink the image, so we’re not timing how long it takes
> to draw pixels in the window. If the wallclock elapsed time is still
> hight, that would indicate a problem somewhere. If it drops to 0.55,
> that’ll be good.
>
The time to display the image is suboptimal because we send the primitives
to the GPU, do all the rendering there into a pix buffer, then copy the
buffer with the results to the CPU, and copy it back to the GPU for display
via OpenGL or whatever. So there's a two-way trip involved, which in the
case of a CPU renderer is just a one-way trip. The way to fix this would be
to change the output display device so that given a pointer to a buffer in
the GPU it would render it with that. But this has not been done yet for
several reasons. The code would be highly dependent on the output device
for one.
> If the RTFM line is the important line, I think the time displayed is
> quite good actually, compared with the elapsed time I was using before. But
> I am curious about the meaning of the 3rd and 4th line, since they seem to
> be the reason for the long times when rendering the scenes.
>
>
> The third line is cpu time (man time), i.e., the number of clock ticks
> across all CPUs. Generally, cputime is wallclock times your number of
> threads. At least, it would be if you had perfect scaling efficiency,
> which is rare. The wallclock line is everything including application
> startup cost, prep, releasing memory, etc.
>
I usually just take the wallclock time into account for the GPU because
there is more to it than just the time spent to compute in the device and
sometimes bus transfers can get quite significant. But yes these statistics
could be improved.
--
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel