On Thu, Aug 10, 2017 at 11:36 PM, Marco Domingues <
> Thanks for reviewing my code and making the adjustments, Vasco! I’ve
> integrated the changes in my patch.
> I’ve finished the port of the new bool_eval() function to OpenCL, and
> although the improved performance, it wasn’t enough to outperform the ANSI
> C code with the Release build.
> For the havoc scene, I got 1.56sec now vs 2.10sec before, when running the
> OpenCL code on my GPU. (command ‘rt -z1 -l5 -s1024’). For reference, the
> same scene renders in 0.63sec with the ANSI C code currently in the trunk.
> Despite that, when I ran the OpenCL code in my CPU, I got 0.64sec now vs
> 2.79sec before. (command ‘rt -z1 -l5 -s1024’).
So let me get this straight. The OpenCL backend is slower in your GPU than
the CPU based trunk/ ANSI C backend. That's not totally unexpected. You
have a consumer GPU with nerfed DP FP.
What I want you to do tomorrow is to compare the trunk/ ANSI C backend with
the OpenCL backend over your CPU with the AMD and Intel OpenCL
implementations. I also want you to time the results with the single-hit
mode if you have the time for that.
Why are the 'rt -s1024' times in your July 27 post different from the times
in your August 7 post?
> I am a little intrigued with this, because smaller scenes like the
> operators.g are clearly faster when using the GPU, (0.06 sec gpu vs 0.16sec
> cpu). Any explanation?
Those scenes are fillrate limited with little depth or scene complexity.
> Other thing that caught my attention was how close the lines RTFM and
> wallclock from the ‘rt’ output are when running the OpenCL code in the CPU,
> compared with the same lines from running the OpenCL code in the GPU. (i.e
> 0.60 and 0.65 sec - cpu vs 0.32 and 1.65 sec - gpu).
> Couldn’t the big difference on the GPU side be caused from transfers
> between CPU-GPU and not by performing ray-intersections, boolean evaluation
> and shading operations? Is there a way to investigate this?
The best way is to use a profiler like the ones I mentioned before.
Alternatively one can time the transfers vs the computations by timing the
appropriate CL calls making sure to clFinish() the queue before you measure
> Tomorrow I will update the previous tables that I shared before on my
> document, now using Release builds. And will also include side by side
> image comparisons between the ANCI C and OpenCL results, for each scene.
Ok. Make sure to try with both the AMD and Intel OpenCL SDKs over the CPU.
I'm not interested in your GPU results right now as it would only
complicate the comparison.
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
BRL-CAD Developer mailing list