Re: [Dri-devel] The next round of texture memory management...

Ian Romanick Mon, 20 Jan 2003 10:45:15 -0800

Jeff Hartmann wrote:

2. We consider the block or group of blocks as an entire "unit", everything
is done on units, not individual pieces of the blocks.  That prevents people
swapping out the first page of a group of textures and someone having to
wait for just that block to come back.

I believe that the block should be unit used.  If each block has a group
ID (the IDs that you talk about below) and a sequence number, we can do
some very nice optimizations.  Imagine a case where we have two textures
that use 51% of the available texture space.  Performance would DIE if
we had to bring in the entire texture every single time.  We can do a
little optimization and only bring in 2% of texture memory each time
instead of 102%.

	Just a slight comment here, if the memory has actually made it out of the
agp aperture no matter how big the page is the cost of getting it back is
the same.  Course I really like the idea of storing sequence numbers though,
gives room for lots of flexibility.  And for situations where we haven't
fully made it out of the agp aperture, the cost of the blit is much smaller.

That's not quite what I meant. Imaging the user has 40MB of on-card memory available for textures. If that user runs an application that uses two 2048x2048x32bpp textures per-frame (each one weighting in at ~22MB), we will either have to use AGP memory for one of them, or (if the card is PCI with no PCIGART) it will have to copy 44MB across the PCI bus every single frame.

In reality, to fit the second texture in on-card memory, we only have to reclaim 4MB. We could then hit a steady state where 18MB of each texture never moves out of on-card memory, and only 8MB of texture needs to be copied in each frame. If we view a texture (or any object) as an ordered sequence of blocks instead of a monolithic lump, we can make that optimization. Based on what you have written below, I think we're in agreement that this is the way to go. :)

[big snip]

One their own (a) and (b) are too simple.  That won't give us enough
functionality to do all the things we want.  The memory bit-map has the
advantage that we can pick the largest free block and try to reclaim
memory around that free block until it is large enough to satisfy the
request.  After having tried to do that with the existing linked list
approach, I can honestly say that a linked list is a very poor
datastructure for that purpose.


On the flip side, the bit-map doesn't store much useful information
about the allocated blocks.  That makes it difficult to select which
memory to reclaim when memory is very full.

The problem is that BOTH of these situations are important performance
cases!


	Okay I have done some thinking, and I think I have a pretty good solution.
Under normal cases we try and get a completely unused portion of memory by
using the bit vector freelist.  If one is unavailable and we can't grow agp
memory for some reason we fall back onto another data structure, which is a
priority heap (priority queue or binary heap depending on the data
structures book you look at.)
	That should have good performance and makes more sense then a linked list I
think, as far as size is concerned.  We could implement a heap with just an
array of index values into the pagelist.  We might get some vampiric(sp?)
performance issues from just about every operation on the heap being log n.
However our heap should be of a small enough height so this won't make too
much a difference over a linked list.  We should make this a min style heap,
where age is the value used for the key in the heap.  That way the top of
the heap is always the texture block or group with the minimum age (longest
since it was last used by the card).  We want to avoid ever doing a search
of the heap, so we store the index into the heap with the pagelist.  When we
rearrange something in the heap we make sure we update these indices.
Perhaps we want to do a different selection then on longest since last use,
but I think that might give us reasonable performance.  I suppose MRU would
be trivial if its a max heap, so that sort of thing would be pretty easy to
implement with this data structure as well.  I know you get different
answers depending on who you ask, but whats the replacement strategy you
would recommend?

It would take some experiments to prove it, but I believe that simple LRU or MRU is always suboptimal in both the theoretical and practical case.

One example where any type of priority queue fails is where you need one more block than is available in the largest free region. If the largest free region is 54 blocks and 55 blocks are needed, the optimal sollution is most likely to reclaim one of the used blocks at the head or tail of the free region.

I think we'd also prefer to reclaim blocks that don't need to be swapped (i.e., have the throw-away bit set).

There are some cases, depending on the number of blocks needed, where we'd also prefer to reclaim blocks that aren't part of a sequence.

We may well settle on using a slight variation of simple LRU or MRU. I'd like to see our initial implementation allow a little more flexability to experiment with gathering different heuristics to improve performance.

[another big snip]

Up to this point, I follow you.  Here is my problem.  Say we have 16KB
blocks.  Say process B had a single block that had 4 vertex buffers in.
 The first 3 are 1KB and the last one is 2KB (and hangs over into the
next block).  This first block is the one that process A selected to
swap-out.

Is the ID number assigned to the first block (the one that was
swapped-out) and the second block (that wasn't swapped-out) the same?
It seems like it should be (and that would enable the kernel to shuffle
things around and keep blocks contiguous), but it seems like it would be
difficult (or at least irritating) to keep all the block IDs correct (as
subregions in the blocks are allocated and freed by a process).

When process B goes to sleep waiting for its blocks to come back, what
locks will it hold?  If it doesn't hold any, how do we prevent process A
from coming back and stealing the second block?


	We don't need to hold any locks when we sleep, but since we have to ask the
kernel for the block back it can decide who to wake up and who to put to
sleep.  We can just wake in a FIFO fashion from the wait queue.  We can also
perhaps make a simple decision in the kernel.  If it sees that there is alot
of contention on one area then it could "recommend" another area in some
fashion to put its textures.

I got to thinking about this, and I came up with another sollution. When process B is going to reclaim blocks from process A, process B marks the reclaimed blocks with its block ID and clears the can-swap bit. It also clears the can-swap bit on other blocks (that it already has) that are part of the sequence. When process B calls into the kernel, it passes in the block number and the old block ID & sequence information. This prevents process A from getting the CPU and trying to take the blocks back.

[snip]

How would we dole out IDs?  Would there be a kernel call to get /
release a set of IDs?

	Probably, however there are methods to do this without kernel intervention.
The good ole' bit vector is great for doling out keys. ;)

Oh.  Duh. :)



-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] The next round of texture memory management...

Reply via email to