Am Mittwoch, den 09.02.2005, 22:12 +0100 schrieb Felix K�hling:
> Am Mittwoch, den 09.02.2005, 20:58 +0100 schrieb Roland Scheidegger:
[snip]
> > Performance with gart texturing, even in 4x mode, takes a big hit
> > (almost 50%).
> > I was not really able to get consistent performance results when both
> > texture heaps were active, I guess it's luck of the day which textures
> > got put in the gart heap and which ones in the local heap. But that
> > performance indeed got faster with a smaller gart heap is not a good
> > sign. And even if the maximum obtained in rtcw with 35MB local heap and
> > 29MB gart heap was higher than the score obtained with 35MB local heap
> > alone, there were clearly areas which ran faster with only the local heap.
> > It seems to me that the allocator really should try harder to use the
> > local heap to be useful on r200 cards, moreover it is likely that you'd
> > get quite a bit better performance when you DO have to put textures into
> > the gart heap when you revisit that later when more space becomes
> > available on the local heap and upload the still-used textures from the
> > gart heap to the local heap (in fact, should be even faster than those
> > 650MB/s, since no in-kernel-copy would be needed, it should be possible
> > to blit it directly).
>
> The big problem with the current texture allocator is that it can't tell
> which areas are really unused. Texture space is only allocated and never
> freed. Once the memory is "full" it starts kicking textures to upload
> new ones. This is the only way of "freeing" memory. Using an LRU
> strategy it has a good chance of kicking unused textures first, but
> there's no guarantee. It can't tell if a kicked texture will be needed
> the next instant. So trying to move textures from GART to local memory
> would basically mean that you blindly kick the least recently used
> texture(s) from local memory. If those textures are needed again soon
> then performance is going to suffer badly.
>
> Therefore I'm proposing a modified allocator that fails when it needs to
> start kicking too recently used textures (e.g. textures used in the
> current or previous frame). Failure would not be fatal in this case, you
> just keep the texture in GART memory and try again later. Actually you
> could use the same allocator for normal texture uploads. Just specify
> the current texture heap age as the limit.
>
> If you try to move textures back to local memory each time a texture is
> used, this would result in some kind of automatic regulation of heap
> usage. By kicking only textures that are several frames old in this
> process, you'd avoid trashing.
>
> Currently the texture heap age is only incremented on lock contention
> (IIRC). In this scheme you'd also increment it on buffer swaps and
> remember the texture heap ages of the last two buffer swaps.
I simplified this idea a little further and attached a patch against
texmem.[ch]. It frees stale textures (and also place holders for other
clients' textures) that havn't been used in 1 second when it runs out of
space on a texture heap. This way it will try a bit harder to put
textures into the first heap before using the second heap, without much
risk (I hope) of performance regressions.
I tested this on a ProSavageDDR where rendering speed appears to be the
same with local and GART textures. There was no measurable performance
regression in Quake3 and I noticed no subjective performance regression
in Torcs or Quake1 either.
Now the only thing missing in texmem.c for migrating textures from GART
to local memory would be a flag to driAllocateTexture to stop trying if
kicking stale textures didn't free up enough space (on the first texture
heap).
Anyway, I think the attached patch should already make a difference as
it is. I'd be interested how much it improves your performance numbers
with Quake3 and rtcw on r200 when both texture heaps are enabled.
>
[snip]
Regards,
Felix
--
| Felix K�hling <[EMAIL PROTECTED]> http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 |
--- ./texmem.h.~1.6.~ 2005-02-02 17:20:40.000000000 +0100
+++ ./texmem.h 2005-02-10 17:44:40.000000000 +0100
@@ -101,6 +101,11 @@
* value must be greater than
* or equal to \c firstLevel.
*/
+
+ double clockAge; /**< Clock time stamp indicating when
+ * the texture was last used. The unit
+ * is seconds.
+ */
};
--- ./texmem.c.~1.10.~ 2005-02-05 14:16:25.000000000 +0100
+++ ./texmem.c 2005-02-10 18:39:15.000000000 +0100
@@ -50,6 +50,7 @@
#include "texformat.h"
#include <assert.h>
+#include <sys/time.h>
@@ -243,6 +244,13 @@
*/
move_to_head( & heap->texture_objects, t );
+ {
+ struct timeval tv;
+ if ( gettimeofday( &tv, NULL ) == 0 ) {
+ t->clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6;
+ } else
+ t->clockAge = 0.0;
+ }
for (i = start ; i <= end ; i++) {
@@ -415,6 +423,15 @@
t->heap = heap;
if (in_use)
t->bound = 99;
+
+ {
+ struct timeval tv;
+ if ( gettimeofday( &tv, NULL ) == 0 ) {
+ t->clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6;
+ } else
+ t->clockAge = 0.0;
+ }
+
insert_at_head( & heap->texture_objects, t );
}
}
@@ -477,6 +494,50 @@
+/**
+ * Free stale textures
+ *
+ * \param heap The heap from which to kick stale textures
+ * \param seconds Kick textures unused for this many seconds
+ */
+
+static void
+driFreeStaleTextures( driTexHeap * heap, double seconds )
+{
+ driTextureObject * temp;
+ driTextureObject * cursor;
+ struct timeval tv;
+ double curTime;
+ if ( gettimeofday( &tv, NULL ) != 0 )
+ return;
+ curTime = (double)tv.tv_sec + (double)tv.tv_usec / 1e6;
+
+ if ( heap == NULL )
+ return;
+
+ for ( cursor = heap->texture_objects.prev, temp = cursor->prev;
+ cursor != &heap->texture_objects ;
+ cursor = temp, temp = cursor->prev ) {
+
+ /* only consider our own textures that are not currently bound */
+ if ( cursor->bound || !cursor->tObj ) {
+ continue;
+ }
+
+ if ( curTime - cursor->clockAge > seconds ) {
+ driSwapOutTextureObject( cursor );
+ }
+ /* Since textures are LRU sorted, it should be safe to terminate
+ * this loop once the first texture is kept. */
+ else {
+ break;
+ }
+ }
+}
+
+
+
+
#define INDEX_ARRAY_SIZE 6 /* I'm not aware of driver with more than 2 heaps */
/**
@@ -514,7 +575,7 @@
/* Run through each of the existing heaps and try to allocate a buffer
- * to hold the texture.
+ * to hold the texture. If this fails, free stale textures and try again.
*/
for ( id = 0 ; (t->memBlock == NULL) && (id < nr_heaps) ; id++ ) {
@@ -522,6 +583,11 @@
if ( heap != NULL ) {
t->memBlock = mmAllocMem( heap->memory_heap, t->totalSize,
heap->alignmentShift, 0 );
+ if ( t->memBlock == NULL ) {
+ driFreeStaleTextures( heap, 1.0 );
+ t->memBlock = mmAllocMem( heap->memory_heap, t->totalSize,
+ heap->alignmentShift, 0 );
+ }
}
}