Hi,

>> On the other hand I could very well believe that if there were
>> sufficiently large spans, this parallelization would help a lot, but then
>> this would be such an exceptional situation that is very far away from
>> actual uses of Mesa.
>>
>
> That's actually quite interesting. However, I think that you are being
> too fine-grain, meaning that you chop up the work too finely and then
> executing each one of the very small chunks in parallel, synchronizing
> at the end (there is an implicit barrier at the end of each #pramga omp
> parallel) and thus spending most of the time synchronizing than computing.
>
> The best would be to fine-grain as much as possible and then execute in
> parallel without synchronization in between.

Agreed on your analysis.  This is why I would try to go one step up. 
Instead of splitting up each span, one could try to parallelize the
rendering of single  triangles.  Setup a number of spans, and consume them
by different threads.  Since the threads are each shifted in y-direction,
there should not be too much data-dependency.

If a typical triangle has a height of 15 pixels, instead of producing 15
threads, one would only have to generate only 1 thread per triangle...
don't know if that is enough to make it worthwhile...  In any case this
would be an easy and cheap approach.  Probably within several days one
should have a prototype to test if there is a performance gain or not...

>
> I've been messing with the cell driver and seeing how the
> parallelization is done there for the SPEs. I guess that should be the
> way to go, maybe you want to take a look at it.
>

I don't know.  I  think I read in another post here that the drawing
surface will be divided into small squares, and each of these squares will
be assigned to a separate thread with all the problems about triangles
lying in several squares etc. etc.  this is maybe "the way to go", but I
fear I'm just too busy and not smart/motivated enough to take an approach
that will require to rewrite the rendering pipeline at many many steps. 
Depending on how far the Cell-driver is, I imagine programming this would
take several weeks or months until everything is finished.
I have to agree that once the infrastructure is in place, it will probably
be quite easy to scale it up from 2 to 100 threads.

> Even if small speedup is achieved, then there is ground for trying to
> parallelize everything. Intel's having Larrabee and in a couple of years
> we'll have a many-core CPU that will do everything. If Mesa is
> parallelizable at the driver level, then it would be a nice open-source
> alternative to what Intel will be offering and a viable solution for
> fast graphics.

In the traditional approach (the one I propose at the beginning), I can
believe that feeding a triangle of 15 spans to 2 or 3 threads might give a
speed up... giving it instead to 100 threads will certainly not give too
much gain.  In this sense the other approach seems to be superior.

Ciao

Klaus


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to