Re: [Dri-devel] The next round of texture memory management...

Jens Owen Fri, 17 Jan 2003 09:48:23 -0800

Ian,

I had a chance to read your ideas on memory managment last night. First off, I'd like to thank you for doing a very good job of collecting requirements and then seperating out your ideas for implementation. This level of discipline really helps me understand where you are constrained by requirements vs. where you are exploring solutions.

As you address the very complex issue of virtualizing graphics subsystem resources, I'm going to attempt to influence your thinking to include the concept of a 3D desktop compositing engine. You've make references to capabilities that Apple is supporting, yet to me the ultimate challenge that the Apple desktop paradigm provides today is the 3D and composoting effects they are doing with Genie bottle window iconfication and multilevel window transparancy. Starting to address these capabilities in open source will put additional requirements on the resource management requirements.

Ian Romanick wrote:

What follows is the collected requirements for the new DRI memory manager. This list is the product of several discussions between Brian, Keith, Allen, and myself several months ago. After the list, I have included some of my thoughts on the big picture that I see from these requirements.

1. Single-copy textures

Right now each texture exists in two or three places. There is a copy in on-card or AGP memory, in system memory (managed by the driver), and in application memory. Any solution should be able to eliminate one or two of those copies.

If the driver-tracked copy in system memory is eliminated, care must be taken when the texture needs to be removed from on-card / AGP memory. Additionally, changes to the texture image made via glCopyTexImage must not be lost.

It may be possible to eliminate one copy of the texture using APPLE_client_storage. A portion of this could be done purely in Mesa. If the user supplied image matches the internal format of the texture, then the driver can use the application's copy of the texture in place of the driver's copy.

Modulo implementation difficulties, it may even be possible to use the pages that hold the texture as backing store for a portion of the AGP aperture. The is the only way to truly achieve single-copy textures. The implementation may prove too difficult on existing x86 systems to be worth the effort. This functionality is available in MacOS 10.1, so the same difficulties may not exist on Linux PPC.

Are the AGP aperture issues present for any AGP page swapping, or just for assigning new, random virutal memory pages? I was under the impression that preallocated AGP memory could be swapped in and out on the x86 platform. In other words, it would be difficult to dynamically map a user texture into the AGP aperature, but we could create a pool of AGP memory that was larger than the apperature and use the APPLE_client_storage extension to allocate space from that pool to the application.

2. Share texture memory among multiple OpenGL contexts

Texture memory is currently shared by all OpenGL contexts. That is, when an OpenGL context switch happens it is not necessary to reload all textures. The texture manager needs to continue to use a paged memory model (as opposed to a segmented memory model).

3. Accommodate other OpenGL buffers

The allocator should also be used for allocating vertex buffers, render targets (pbuffers, back-buffers, depth-buffers, etc.), and other buffers. This can be useful beyond supporting SGIX_pbuffer, ARB_vertex_array_objects, and optimized display lists. Dynamically allocating per-context depth and back-buffers will allow multiple Z depths be used at a time (i.e., 16-bit depth-buffer for one window and 24-bit depth-buffer for another) and super-sampling FSAA.

For traditional 2D window systems, this requirement is sufficient in that you don't need to be able to truly provide an unlimited amount of private buffer space...rather when you run out of space, you can fall back to a method where memory is allocated from a single large buffer based on visible display pixels.

That said, a 3D compositing window system couldn't fall back on this method. Imagine N transparent windows all stacked on top of each other and each needing dedicated display resources in order to yield the correct final display results. Virtualizing an infinite number of color and alpha layers may not be possible in hardware alone, but software compositing can be prohibitively slow. Perhaps providing a large dedicated amount of resources to 3D compositing and virualizing all non visible resources could provide a reasonable solution. This implies that back buffers, depth buffers, pbuffers and super sampled buffers all need to be potentially swapped out when the rendering context is swapped out.

4. Support texture pseudo-render targets

Accelerating some OpenGL functions, such as glCopyTexImage, SGIS_generate_mipmaps, and ARB_render_texture, may require special support and consideration.

5. Additional AGP related issues

There may be cases where textures need to be moved back-and-forth between AGP and on-card memory. For example, a texture might reside in AGP memory, and an operation may be requested that requires that the texture be in on-card memory.

6. Additional texture formats and layouts

Compressed, 1D, 3D, cube map, and non-power-of-two textures need to be supported in addition to "traditional" 2D power-of-two textures.

7. Allen Akin's pinned-texture proposal

If we ever expose memory management to the user (beyond texture priorities) we want to be sure our allocator is designed with this in mind.

8. Device independence

As much as possible, the source code for the memory manager should live somewhere device independent. This is both for the benefit of newly developed drivers and for maintaining existing drivers.

* My Thoughts *

There are really only two radical departures from the existing memory manager. The first is using the memory manager for non-texture memory objects. The second, which is partially a result of the first, is the need to "pin" objects. It would not do to have one context kick another context's depth-buffer out of memory!

Why not swap out another context's depth buffer? If it's not being used at the time, is that any worse than swapping out textures that are actively being used by the yielding context?

My initial thought on how to accomplish this was to move the allocator into the kernel. There would be a low-level allocator that could be used for non-texture buffers and a way to create textures (from data). In the texture case, the kernel would only allocate memory when a texture was used. In stead of using the actual texture address in drawing command streams, the user-level driver would insert texture IDs. The kernel would use these IDs to map to real texture addresses.

The benefit is that all memory management would be handled by a single omniscient execution context (the kernel). The downside is that it would move a LOT of code into the kernel. It would be almost entirely OS and device independent, but there would likely be a lot of it.

After talking with Jeff Hartmann in IRC on 1/13, I started thinking about all of this again. Jeff had some serious reservations about moving that volume of code into the kernel, and he believed that all of the requirements could be met by a purely user-space implementation. After thinking about things some more, I'm starting to agree.

What follows is a fairly random series of thoughts on how a user-space memory manager could be made to work.

I believe that everything could be done by breaking each memory space down into blocks (as is currently done) and tracking two values, either implicitly or explicitly, with each block. The first value is some sort of swap-out priority. This is currently implicitly tracked by the list ordering in the SAREA. The other value is basically a semaphore, but it could be implemented as a simple can-swap bit.

Blocks that have active depth-buffer would never have can-swap set. Blocks that have "normal" textures, back-buffer, render-target textures, and puffers would have their can-swap bit conditionally set. Each of these types of blocks would have the can-swap bit cleared under the following situations:

- Normal textures - While a rendering operation is queued that
will use the texture.
- SGIS_generate_mipmaps textures - While the blits are in progress
to create the filtered mipmaps.
- glCopyTexImage textures - While the blit to copy image data to
the texture is in progress and while the data in the texture has
not been copied to some sort of backing store.
- pbuffers - While rendering operations to the pbuffer are in
progress. pbuffers have a mechanism to tell an application when
the contents of the pbuffer have been "lost." This could be
exploited by the memory manager. One caveat is when a pbuffer
is bound to a texture (ARB_render_texture). While the pbuffer
is bound to a texture, its contents cannot be lost. Can the
contents be "swapped out" to some sort of backing store, like
with glCopyTexImage targets?

There is another caveat for PBuffers that Allen brought to my attention a few years ago. They way they are currently defined, it's possible for the application to request a PBuffer that can not be "destroyed", but rather must be swapped out and then restored later.

- Back-buffers - In unextended GLX, back-buffers can never be
swapped. However, if OML_sync_control is available, a "double
buffered" visual may want to have many virtual back-buffers.
Each time glXSwapBuffersMscOML (essentially an asynchronous
glXSwapBuffers call) is made, a new back-buffer is allocated as
the rendering target. Once a back-buffer is copied to the
front-buffer (i.e., the queued buffer-swap completes), the
back-buffer can be swapped-out.

There may be other situations where can-swap is cleared, but that's all I could think of. Similar rules would exist for vertex buffers (for ARB_vertex_array_object, EXT_compiled_vertex_array, optimized display lists, etc.).

Only a single bit per block is needed in the SAREA. That bit is the union of the bits for each object that is part of that block. This union must be calculated by the user-space driver. This presents a possible problem of user-space clients failing to update the can-swap bits for some reason (process hung on blocking IO call?). The current implementation avoids this problem by forcing all bocks to be swappable at all times.

At this point I'm left with a few questions.

1. In a scheme like this, how could processes be forced to update the
can-swap bits on blocks that they own?
2. What is the best way for processes to be notified of events that
could cause can-swap bits to change (i.e., rendering completion,
asynchronous buffer-swap completion, etc.)? Signals from the kernel?
Polling "age" variables?
3. If some sort of signal based notification is used, could it be used
to implement NV_fence and / or APPLE_fence?
4. How could the memory manager handle objects that span multiple
blocks? In other words, could the memory manager be made to prefer
to swap-out blocks that wholly contain all of the objects that
overlap the block? Are there other useful metrics? Prefer to
swap-out blocks that are half full over blocks that are completely
full?
5. What other things I have I missed that might prevent this system
from working? :)

-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

I hope I'm bringing in some food for thought...and not unnecesarily complicating an already difficult and important DRI improvement.

--
/\
Jens Owen / \/\ _
[EMAIL PROTECTED] / \ \ \ Steamboat Springs, Colorado

-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will
allow you to extend the highest allowed 128 bit encryption to all your clients even if they use browsers that are limited to 40 bit encryption. Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] The next round of texture memory management...

Reply via email to