Re: [Mesa3d-dev] Re: [Dri-devel] TnL interface in the OOP/C++ world

Keith Whitwell Fri, 04 Apr 2003 22:35:33 -0800

Ian Romanick wrote:

José Fonseca wrote:
On Fri, Apr 04, 2003 at 10:08:36AM -0800, Ian Romanick wrote:

In principle, I think the producer/consumer idea is good. Why not implement known optimizations in it from the start? We already having *working code* to build formated vertex data (see the radeon & r200 drivers), why not build the object model from there? Each concrete producer class would have an associated vertex format. On creation, it would fill in a table of functions to put data in its vertex buffer. This could mean pointers to generic C functions, or it could mean dynamically generating code from assembly stubs.

The idea is that the functions from this table could be put directly in the dispatch table. This is, IMHO, critically important.

The various vertex functions then just need to call the object's produce method. This all boils down to putting a C++ face on a technique that has been demonstrated to work.
I hope that integration of assembly generation with C++ is feasible but
I see it as an implementation issue, regardless the preformance issues,
which according to all who have replied aren't that neglectable as I
though.  The reason is that this kind of optimizations is very dependent
of the vertex formats and other hardware details dificulting reusing the
code - which is exactly what I want to avoid at this stage.
Realistically, either hardware or software uses either array-of-strctures or structure-of-arrays. Most hardware uses the former. At that point it becomes a matter of, for a given state vector, what's the offset in the structure of an element? The assembly code in the radeon & r200 drivers handles this very nicely.

I do have one question. Do we really want to invoke the producer on every vertex immediatly? In the radeon / r200 drivers this is just to copy the whole vertex to a DMA buffer. Why not generate the data directly where it needs to go? I know that if the vertex format changes before the vertex is complete we need to copy out of the temporary buffer into the GL state vector, but that doesn't seem like the common case. At the very least, some guys at Intel think generating data directly in DMA buffers is the way to go:

http://www.intel.com/technology/itj/Q21999/ARTICLES/art_4.htm

This is a very interesting read. Thanks for the pointer.

It's complicated to know the vertices position on the DMA from the beginning, specially because of the clipping, since vertices can be added or removed, but if I understood correctly, it's still better to do that on the DMA memory and move the vertices around to avoid cache hits. But can be very tricky: imagine that clipping generate vertices that don't fit the DMA buffer anymore, what would be done then?

I think the "online driver model" from the paper only works if you have a single loop that does all the processing. Since Mesa uses a pipeline, it would be very tricky. Using the "online driver model" for a card w/HW TCL would be a different story.
The things I found more interesting in the issue of applting the TCL
operations on all the vertices at once, or a vertice at each time. From
previous discussions on this list it seems that nowadays most
of CPU performace is dictated by the cache, so it really seems the later
option is more efficient, but Mesa implements the former (they are even
called "pipeline stages") and to change would mean a big overhaul of the
TnL module.
This would be very, very, very tricky. We'd basically need several different super-loops depending on the GL state vector. The super-loops would go in the pipeline at the same place where the hardware TCL functions go. If the super-loop could do all the processing, the following TCL stages would be skipped.

This sounds like the 'fastpath' stages which were common in drivers based on Mesa-3.x. We had a pipeline stage which most drivers supplied which was tuned to handle quake-3 cva style rendering operations. It was pretty fast, but in the end not much faster than Mesa-4.x standard operation.

The fallback hardware tcl processing in the radeon drivers is installed as a pipeline stage also.

Keith

------------------------------------------------------- This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Mesa3d-dev] Re: [Dri-devel] TnL interface in the OOP/C++ world

Reply via email to