Hi

I found that thread very interesting, and I experimented a bit
(unfortunately with no success).  I would still like to describe what I
did, maybe this helps someone finding a better approach.

I used the Irrlicht-demo 2 to profile.  This demos loads a quake level, in
which you can walk around.  It is textured, and the triangles are probably
not too small.  Hmm, on the downside, it only displays frames per second
(integer values), which tend to be around 5 or 6 on my computer, so that
the values are not very precise (+-20%).  BTW: Tests were done on an
AthlonX2 in 64-bit mode.

I used "valgrind --tool=callgrind"  to find the functions that used up the
largest chunk of time.   Then I started to go through the code.

Conceptually the easiest thing was to add an "#pragma omp parallel for" in
the loops of the function "sample_linear_2d", and I knew that in my
example all rendering operations used that function.  The good news were
that no rendering errors happened afterwards.  So that the code is really
parallelizable.  The bad news were that the fps went down from 5,6 to 2 or
3!!!

Starting from that point I played around
Adding "#pragma omp parallel for if (n > XX)" helps a lot, but only
approaches the 5 frames from below as I increase the XX.  In the end I
guess that XX is so large that no span is rendered in parallel mode, so...
On the other hand I could very well believe that if there were
sufficiently large spans, this parallelization would help a lot, but then
this would be such an exceptional situation that is very far away from
actual uses of Mesa.

I went one function up, i.e. to "_swrast_texture_span", and added there a
omp-construct like

#pragma omp parallel
#pragma omp single

...

and then the "#pragam omp for"  where ever it seemed to make sense (also
in subfunctions.  Good thing still no rendering errors, bad thing fps went
down.  I added if-clauses everywhere and played with parameters but no
real improvements...

Conclusion: Maybe the "#praga omp parallel for" would help in more
complicated functions.  Someone should try for example
"sample_3d_linear_mipmap_linear".  Unfortunatelly I have no good test
program to try this myself.

The next step in the experimentation should be to go up another level, and
try to add threads there.  All experiments I did so far were on the level
of individual spans, but now one should go into the triangle rendering
functions and try to compute two (or 4 if one has a QuadCPU) spans in
parallel.  Unfortunately this will require to modify the sources a lot,
and the potential results are unclear.

The functions to consider here are defined in s_triangle.c.  The file
contains function templates that are expanded to real functions by reading
in s_tritemp.h.  Basically if you need a triangle function that does no
texturing, then you only define INTERP_RGBA, and maybe INTERP_Z or
something, define the name you want the function to have, and then include
the "s_tritemp.h" file.  This contains lots of #ifdef INTER_RGBA-clauses
that do then exactly the interpolations you needed.  And that way the
preprocessor has generated then the function you wanted to have, optimized
for your needs.

My excuses if I'm telling trivialities here.  In any case, maybe one could
define a INTERP_MULTITHREADED or so, and then take care of this in
s_tritemp.h.  Lot of work.  It is unclear what the benefit will be.
You need to adapt the "struct sw_span" in s_span.h.  The file already
contains suggestions

"...*
 * It would be interesting to experiment with multiprocessor rasterization
 * with this structure.  The triangle rasterizer could simply emit a
 * stream of these structures which would be consumed by one or more
 * span-processing threads which could run in parallel.
 */
"
I think sw_span was also "anchored" in the rendering context s_context.h,
so maybe it would be sufficient to add there an array of sw_spans indexed
by the ids of threads such that thread 0 will always run on ctx->span[0],
thread 1 will always run on ctx->span[1], etc.
The more I think about it, the more interesting I find it... I have a lot
of work to do, but Giannis, if you are really interested in trying this
out, we can certainly discuss a bit...

I don't know, if from a practical point of view it will be worth the
trouble. Everybody has multi core CPUS today or tomorrow, but then only
few people really interested in 3D use software rendering.  On the other
hand it is unclear if the speed up will be large or not.

Cheers

Klaus



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to