I'm not sure of the merrits of OpenMP vs. pthreads but a few comments: As Ben suggested, probably the best way to take advantage of multiple processors is to begin by parallelizing rasterization. Fragment processing (in particular when there's lots of texture sampling or non-trival fragment shaders) is the typical bottleneck. In other cases, vertex transformation could be the bottleneck, but I don't think that's as often the case as fragment processing.
If you want to start looking at code, I'd suggest you work on the gallium softpipe driver, as that's the future. Also, you might look at the Cell driver which I've been working on. It divides the window into tiles which are rendered by the Cell's SPUs. I'm not too familiar with OpenMP. When you talk about OpenMP, are you talking about parallelizing across multiple machines and simulating shared memory across a network? Or is it just for shared-memory multiprocessors? I wouldn't bother with the former. I'm pretty comfortable with pthreads so I wouldn't hesitate to go that way. -Brian Ioannis Papadopoulos wrote: > I'll look into that, seems like starting point and is coarse-grained > enough. Thanks. > > I have spend some time looking into the Larrabee architecture and it was > basically my inspiration: at some point, all these processors will > probably live in the main CPU that will contain 100s of simple cores and > then just a very simple hardware to actually show the buffer on the > screen. Then it makes sense to have a scalable solution for software > OpenGL on many cores. > > Ben Harper wrote: >> I'm pretty ignorant of the Mesa internals, but my first stab at such a >> thing would be to try and parallelize the triangle rasterizer by >> splitting the framebuffer into tiles of say 64x64 pixels, and have a >> queue for each of those tiles. Then, you have a pool of rasterization >> threads that consume the work in each queue. You'd need to duplicate >> triangles that fall into multiple tiles.. but I'm guessing that would >> be the worst of the work. In case you haven't already, I would >> recommend reading Intel's paper on larrabee that was presented this >> year at siggraph. >> >> Ben >> >> On Thu, Oct 23, 2008 at 9:18 AM, Ioannis Papadopoulos >> <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I'm interested in parallelizing some parts of Mesa using OpenMP. I don't >>> know if anyone has tried it however I think it worths a shot. >>> >>> I'm aware of pMesa, but it's not exactly what I have in mind. I'm more >>> interested in seeing how well would Mesa behave in a manycore chip >>> (although there is not one available currently). More specifically, I >>> want to see what speedups I can get using 2-16 cores and OpenMP, without >>> any hardware support. For example, on my small Core2 Duo, nVidia 8400M >>> GS is 4-5x faster in OpenGL than Mesa using 1 processor in glxgears - >>> theoretically, if I assume that I can parallelize decently (minimize >>> OpenMP barriers, have good load balancing) with 6 cores one can match >>> the nVidia chip. >>> >>> Is there any profiling information to see where the most time is spend >>> and what is the critical path? I'm mostly interested in loops (even >>> hand-unrolled ones) and discrete tasks that could potentially be >>> offloaded to an OpenMP thread. >>> >>> I started looking into src/math/m_matrix.c for my first tests, but I >>> need some more information such as: >>> 1) which functions are called over and over again in real opengl >>> applications (and here the actual low-level functions are the >>> interesting ones, not the OpenGL API functions), >>> 2) which parts can run in parallel (data dependencies etc), >>> 3) if anyone else is interested in this attempt. >>> >>> I'm not in favor of pthreads since: 1) you need to have a runtime system >>> that schedules the tasks and does the load balancing and as such you >>> might lose scalability if you don't do the best possible, 2) it is >>> error-prone and 3) you have to do all the fine-tuning by hand. On the >>> other hand, OpenMP is quite easy to program even for beginners, most >>> major compilers support it really well and it scales very well. However, >>> being a data parallelization technique, it requires some additional >>> effort to force it to execute some tasks. For a starting point I think >>> it is a very decent solution. >>> >>> Thanks in advance, >>> Giannis >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & win great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Mesa3d-dev mailing list >>> Mesa3d-dev@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev >>> >>> > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev