[EMAIL PROTECTED] wrote: > Hi > > I found that thread very interesting, and I experimented a bit > (unfortunately with no success). I would still like to describe what I > did, maybe this helps someone finding a better approach. > > I used the Irrlicht-demo 2 to profile. This demos loads a quake level, in > which you can walk around. It is textured, and the triangles are probably > not too small. Hmm, on the downside, it only displays frames per second > (integer values), which tend to be around 5 or 6 on my computer, so that > the values are not very precise (+-20%). BTW: Tests were done on an > AthlonX2 in 64-bit mode. > > I used "valgrind --tool=callgrind" to find the functions that used up the > largest chunk of time. Then I started to go through the code. > > Conceptually the easiest thing was to add an "#pragma omp parallel for" in > the loops of the function "sample_linear_2d", and I knew that in my > example all rendering operations used that function. The good news were > that no rendering errors happened afterwards. So that the code is really > parallelizable. The bad news were that the fps went down from 5,6 to 2 or > 3!!! > > Starting from that point I played around > Adding "#pragma omp parallel for if (n > XX)" helps a lot, but only > approaches the 5 frames from below as I increase the XX. In the end I > guess that XX is so large that no span is rendered in parallel mode, so... > On the other hand I could very well believe that if there were > sufficiently large spans, this parallelization would help a lot, but then > this would be such an exceptional situation that is very far away from > actual uses of Mesa. > That's actually quite interesting. However, I think that you are being too fine-grain, meaning that you chop up the work too finely and then executing each one of the very small chunks in parallel, synchronizing at the end (there is an implicit barrier at the end of each #pramga omp parallel) and thus spending most of the time synchronizing than computing.
The best would be to fine-grain as much as possible and then execute in parallel without synchronization in between. > I went one function up, i.e. to "_swrast_texture_span", and added there a > omp-construct like > > #pragma omp parallel > #pragma omp single > > ... > > and then the "#pragam omp for" where ever it seemed to make sense (also > in subfunctions. Good thing still no rendering errors, bad thing fps went > down. I added if-clauses everywhere and played with parameters but no > real improvements... > > Conclusion: Maybe the "#praga omp parallel for" would help in more > complicated functions. Someone should try for example > "sample_3d_linear_mipmap_linear". Unfortunatelly I have no good test > program to try this myself. > > The next step in the experimentation should be to go up another level, and > try to add threads there. All experiments I did so far were on the level > of individual spans, but now one should go into the triangle rendering > functions and try to compute two (or 4 if one has a QuadCPU) spans in > parallel. Unfortunately this will require to modify the sources a lot, > and the potential results are unclear. > I've been messing with the cell driver and seeing how the parallelization is done there for the SPEs. I guess that should be the way to go, maybe you want to take a look at it. > The functions to consider here are defined in s_triangle.c. The file > contains function templates that are expanded to real functions by reading > in s_tritemp.h. Basically if you need a triangle function that does no > texturing, then you only define INTERP_RGBA, and maybe INTERP_Z or > something, define the name you want the function to have, and then include > the "s_tritemp.h" file. This contains lots of #ifdef INTER_RGBA-clauses > that do then exactly the interpolations you needed. And that way the > preprocessor has generated then the function you wanted to have, optimized > for your needs. > > My excuses if I'm telling trivialities here. In any case, maybe one could > define a INTERP_MULTITHREADED or so, and then take care of this in > s_tritemp.h. Lot of work. It is unclear what the benefit will be. > You need to adapt the "struct sw_span" in s_span.h. The file already > contains suggestions > > "...* > * It would be interesting to experiment with multiprocessor rasterization > * with this structure. The triangle rasterizer could simply emit a > * stream of these structures which would be consumed by one or more > * span-processing threads which could run in parallel. > */ > " > I think sw_span was also "anchored" in the rendering context s_context.h, > so maybe it would be sufficient to add there an array of sw_spans indexed > by the ids of threads such that thread 0 will always run on ctx->span[0], > thread 1 will always run on ctx->span[1], etc. > The more I think about it, the more interesting I find it... I have a lot > of work to do, but Giannis, if you are really interested in trying this > out, we can certainly discuss a bit... > > I don't know, if from a practical point of view it will be worth the > trouble. Everybody has multi core CPUS today or tomorrow, but then only > few people really interested in 3D use software rendering. On the other > hand it is unclear if the speed up will be large or not. > > Cheers > > Klaus > Even if small speedup is achieved, then there is ground for trying to parallelize everything. Intel's having Larrabee and in a couple of years we'll have a many-core CPU that will do everything. If Mesa is parallelizable at the driver level, then it would be a nice open-source alternative to what Intel will be offering and a viable solution for fast graphics. After all, parallelizing Mesa is not that innovative - nVidia and ATI does that in a lower level inside their solutions: they take OpenGL operations, break them down to pieces and then schedule them on their own proprietary system-on-a-chip (which we call GPU). ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev