Re: [Dri-devel] 3rd TMU on radeon

Ian Romanick Mon, 23 Jun 2003 14:37:03 -0700

Andreas Stenglein wrote:

Am 2003.06.23 19:12:01 +0200 schrieb(en) Keith Whitwell:

Ian Romanick wrote:

As another data point, I have attached my very old patch to enable the 3rd TMU on Radeon. IIRC, it worked w/HW TCL, vtxfmt, & codegen. It is now quite outdated. There were a couple of reasons I did not commit any of this.
thanks a lot, it seems I have missed some other bits, for example the
fallback in radeon_compat.

I knew there was a reason I kept that code around. :)

1. A lot of it (i.e., calculate_max_texture_levels) would be superseded by the texmem branch (which has now been merged to the trunk).

2. Enabling the 3rd TMU can drastically reduce the maximum available texture size on some memory configurations. This is even more significant on the R200 which has 6 TMUs.
On my 64MB Radeon7500 the max texture size is 2048 with 2TMU and 1024 with 3TMUs. It shouldnt be a problem for 32MB versions as long es the max texture size is as big as opengl requires as minimum (256?) If you really dont need the 3rd TMU, you can switch support off via environment variable. And we could switch the 3rd TMU off automagically if the resulting max texture size gets to small, then recalculate it. But I doubt there are 16MB Radeons available.
For the R200: I think only 64MB and 128MB versions exist. And we can
make it to support a max. of 2, 4 or 6 TMUs by envvar or via the
upcoming config-stuff.
Most programs/games make use of at least 2 TMUs, newer ones use 4,
maybe 3, and some even 6 TMUs if they are available.

For mobile systems, there are M6 chips with as little as 8MB. I think there are mobile R200 derrived chips with as little as 16MB or 32MB. I'm not 100% sure on that, though.

From an application perspective, why should we penalize texture quality across the board for worst-case situations that may or may not ever happen? Like I said before, how frequently will an application try to bind a 2048x2048 texture to all of the available texture units? If the answer is never, why should the app be forced to use 512x512 textures for everything just so that we can be "safe"?

This may be a place where we should use a config option. A slider for selecting the maximum texture size should work.

3. There are some problems with some fast-pathing in the vtxfmt code. The code assumes that the allowable range for 'target' (see radeon_vtxfmt_c.c, line 542) is a power of two. If an app calls glMultiTexCoord2fv with a target of 3 (assuming the mask value is changed from 1 to 3), the driver will explode.
I tried to just allocate a dummy (texcoordptr[3]) and let it point to tex0 or so. A modified multiarb.c which used 4 TMUs even if the driver doesnt support it didnt at least crash.

That was the sollution I had thought of, too.

4. A similar problem to #3 exists with the codegen path. The fast paths selected in radeon_makeX86MultiTexCoord2fvARB (see radeon_vtxfmt_x86.c, line 354) and friends may not be expandable to the 3 (or 6 for R200) TMU cases.
the dummy should help in this case, too.

I don't think it helps here because the texcoord data is treated as a simple array. The assumption is that texcoordptr[0][x*n] is the same as texcoordptr[x]. We can't put any padding in the vertex buffer itself. The hardware won't let us.

I think the best bet is to use the fast path exactly as it is used now (if exactly TMU 0 & 1 are enabled) and use the slower path otherwise. One R200 we could also use the fast path if TMU 0 & 1 or 0 & 1 & 2 & 3 are enabled. I'm not sure if they payoff would be worth the effort, though.

At worst a test can be used in this code. If there's no sane way to avoid it, we have to do it & that's that.

The first issue is a non-issue now. My original intention, before discovering the second issue, was to "merge" the patch after merging the texmem branch. It turns out that it took much longer to make the branch mergable than initiallly anticipated.

I think we're going to have to wrestle with the second issue at some point. When the next round of texmem work is complete, we won't be able to predict apriori how big the largest texture set can be. Even now, I find it unlikely that on an R200 there would be 6 2048x2048 cube maps (the worst case) bound at any time. This renders the current calculation somewhat bogus to begin with. It seems that the existing closed-source drivers just advertise the hardware maximum in all cases.

If the hardware maximum is advertised, then an app could bind a set of textures that can't fit in memory at once. The driver would then have to fallback to software. I believe the open-source drivers used to function this way, but doing so caused problems with Quake2. I'm really not sure what the right sollution is.
Correct - and in fact they still should function this way if the situation somehow arises that the bound textures can't all be uploaded.
Or adding a fallback to multipass-rendering with 2 TMUs before
fallback to sw-rendering. That might get a bit tricky.

Tricky isn't the word. Down right horrible is the word! I thought about this once WRT implementing some missing parts of ARB_texture_env_combine for MGA. The interactions with stencil-buffer, depth-buffer, and other subtle bits made my head hurt.

-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] 3rd TMU on radeon

Reply via email to