Hi I found that thread very interesting, and I experimented a bit (unfortunately with no success). I would still like to describe what I did, maybe this helps someone finding a better approach.
I used the Irrlicht-demo 2 to profile. This demos loads a quake level, in which you can walk around. It is textured, and the triangles are probably not too small. Hmm, on the downside, it only displays frames per second (integer values), which tend to be around 5 or 6 on my computer, so that the values are not very precise (+-20%). BTW: Tests were done on an AthlonX2 in 64-bit mode. I used "valgrind --tool=callgrind" to find the functions that used up the largest chunk of time. Then I started to go through the code. Conceptually the easiest thing was to add an "#pragma omp parallel for" in the loops of the function "sample_linear_2d", and I knew that in my example all rendering operations used that function. The good news were that no rendering errors happened afterwards. So that the code is really parallelizable. The bad news were that the fps went down from 5,6 to 2 or 3!!! Starting from that point I played around Adding "#pragma omp parallel for if (n > XX)" helps a lot, but only approaches the 5 frames from below as I increase the XX. In the end I guess that XX is so large that no span is rendered in parallel mode, so... On the other hand I could very well believe that if there were sufficiently large spans, this parallelization would help a lot, but then this would be such an exceptional situation that is very far away from actual uses of Mesa. I went one function up, i.e. to "_swrast_texture_span", and added there a omp-construct like #pragma omp parallel #pragma omp single ... and then the "#pragam omp for" where ever it seemed to make sense (also in subfunctions. Good thing still no rendering errors, bad thing fps went down. I added if-clauses everywhere and played with parameters but no real improvements... Conclusion: Maybe the "#praga omp parallel for" would help in more complicated functions. Someone should try for example "sample_3d_linear_mipmap_linear". Unfortunatelly I have no good test program to try this myself. The next step in the experimentation should be to go up another level, and try to add threads there. All experiments I did so far were on the level of individual spans, but now one should go into the triangle rendering functions and try to compute two (or 4 if one has a QuadCPU) spans in parallel. Unfortunately this will require to modify the sources a lot, and the potential results are unclear. The functions to consider here are defined in s_triangle.c. The file contains function templates that are expanded to real functions by reading in s_tritemp.h. Basically if you need a triangle function that does no texturing, then you only define INTERP_RGBA, and maybe INTERP_Z or something, define the name you want the function to have, and then include the "s_tritemp.h" file. This contains lots of #ifdef INTER_RGBA-clauses that do then exactly the interpolations you needed. And that way the preprocessor has generated then the function you wanted to have, optimized for your needs. My excuses if I'm telling trivialities here. In any case, maybe one could define a INTERP_MULTITHREADED or so, and then take care of this in s_tritemp.h. Lot of work. It is unclear what the benefit will be. You need to adapt the "struct sw_span" in s_span.h. The file already contains suggestions "...* * It would be interesting to experiment with multiprocessor rasterization * with this structure. The triangle rasterizer could simply emit a * stream of these structures which would be consumed by one or more * span-processing threads which could run in parallel. */ " I think sw_span was also "anchored" in the rendering context s_context.h, so maybe it would be sufficient to add there an array of sw_spans indexed by the ids of threads such that thread 0 will always run on ctx->span[0], thread 1 will always run on ctx->span[1], etc. The more I think about it, the more interesting I find it... I have a lot of work to do, but Giannis, if you are really interested in trying this out, we can certainly discuss a bit... I don't know, if from a practical point of view it will be worth the trouble. Everybody has multi core CPUS today or tomorrow, but then only few people really interested in 3D use software rendering. On the other hand it is unclear if the speed up will be large or not. Cheers Klaus ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev