[Dri-devel] Updated texmem-0-0-2 design document (24-Feb-03)

Ian Romanick Mon, 24 Feb 2003 11:37:03 -0800

Here is the updated version of the texmem-0-0-2 design document. Hopefully everyone will have a chance to look over the changes before the #dri-devel meeting today. Sorry for getting this out so late.

-*- mode: text; mode: fold -*-

                   DRI Memory Manager texmem-0-0-2 Design
                        Ian Romanick <[EMAIL PROTECTED]>



{{{ 1. Introduction

The current texture management system used by the open-source DRI drivers
has reached its limits.  The current implementation works, and it does the
functionality that it implements well.  However, there is a large class of
functionality that the current system simply cannot easily support.
Additionally, there are a number of cases where the current system is
clearly suboptimal.

The current texture memory manager only supports "throw away" data.  It is
assumed that, with proper synchronization with the rendering hardware, any
data that is in either AGP or on-card memory can be thrown away and
re-loaded at anytime.  For texture data loaded from main memory, this
assumption holds true.  However, there are several classes of data that need
to be locked for longer than the duration of a single rendering command.

Anytime a texture is a rendering target, be it as the destination of
glCopyTexImage or as the rendering target of a pbuffer, that texture data
cannot be thrown away until after it is copied to some sort of backing
store.  This restriction prevents the current memory manager from hardware
accelerating glCopyTexImage.  This restriction has also prevented
implementation of pbuffers in open-source drivers and has forced binary-only
driver, such as ATI's FireGL drivers, to use a static, fixed size memory
area for pbuffers.

Significant changes need to be made to the memory management system to
support render-target buffers.

In addition to supporting render-to-texture, pbuffers, and glCopyTexImage,
the DRI should be able to support dynamic allocation of back-buffers and
depth-buffers.  While this is a secondary consideration, this will allow
different depth-buffer depths on the same display (i.e., one window can use
a 16-bit depth-buffer while another uses a 24-bit depth-buffer) and
full-screen anti-aliasing (FSAA).

As has recently been discussed on the dri-devel mailing list, texture
uploads in the DRI are not fully pipelined [1].  For applications that need
to use dynamic textures, such as streaming video, this presents a
performance bottleneck.  In order to support fully pipelined texture
uploads, a mechanism is needed to determine when rendering using a
particular texture buffer is complete.  The current memory manager lacks
such a mechanism.

Each texture image in the DRI can be replicated in memory in as many as
three places.  The texture can exist in on-card memory, in backing-store in
main memory, and in storage in the application.  As the texture requirements
of games and other applications continues to increase, duplicating texture
data becomes more and more painful.  In non-extended OpenGL it should be
possible to reduce the number of copies of a texture to no more than two.
With the use of extensions such as APPLE_client_storage[2] or
ARB_pixel_buffer_object[3], it should be possible to achieve single-copy
textures.

The remainder of this document is divided into two major sections.  The
first section details the design of the new memory manager.  This section is
further divided into a section detailing the the data structures used in the
implementation and a section detailing division of work between the kernel
driver and the user space driver.

As this is intended to be a living document, the remaining section describes
the open issues addressed during the design process.  Issues are marked as
either open or closed.  Each issue will have some description of the problem
and possible solutions.  Each closed issues will have a description of the
resolution.  It is expected the the resolution of closed issues will be
incorporated in the other sections of the document.

}}}

{{{ 2. Design of the memory manager

{{{ 2.1 Data structures

Each of the main data structures used by the memory manager lives in a
shared memory space and is accessed by all processes on the system.  As
such, these shared memory areas are protected by a mutex.  In the initial
implementation all memory areas will share the mutex currently used to
serialize access to the existing SAREA.  In the future additional mutexes may
be created.

{{{ 2.1.1 Memory block

The regions of graphics memory (i.e., AGP aperture, on-card, etc.) are
divided into a fixed number of logical blocks.  For each region of graphics
memory, there is an array of 'struct memory_block' in main memory that
represents these logical blocks.  The ith element in the array represents
the ith logical block in the memory region.

The granularity of the blocks can vary depending on a number of factors, and
may be driver dependent.  This is similar to the way the current memory
manager works.  However, the current memory manager divides memory into 64KB
blocks, with a maximum of 64 blocks.  The current intention is to divide
each region in much smaller blocks and allow up to 65536 blocks.  For texture
data, 64KB blocks are fine, but this is not fine enough granularity for
vertex data.

The most important field in the memory_block structure is fence.  Each
driver maintains a 32-bit counter of rendering operations.  This counter is
maintained at whatever granularity the driver sees fit.  This may be per
rendering operation, per DMA buffer, or some other granularity.  Each object
(texture, vertex buffer, etc.) stored in a memory_block has an individual
fence value.  When the driver fence value is greater than the fence value
for the object, then all in-flight rendering objects that use that object
have completed.  The fence value stored in the memory_block is the maximum
fence value over the set of objects in the memory_block.

Since individual objects can span multiple memory blocks, each memory_block
tracks a sequence number and an ID.  A group of blocks with the same ID
number is considered to be a linear memory range ordered by sequence
numbers.  This information is used to allow the various parts of the memory
manager to page block in and out and maintain proper ordering of data in
memory.

There is also a set of flags associated with each block.  If BLOCK_PIN is
set for a block, then the block is pinned in memory and cannot be moved or
removed regardless of the fence value.  This would be used for back-buffers
that are queued to be swapped to the front-buffer by an asynchronous
buffer-swap call.  However, it could be used in any case where the block
depends on some activity other than rendering.  For example, an
implementation of ATI_map_object_buffer or ARB_vertex_buffer_object would
need to use this bit while the buffer is mapped.

If a all the objects in a block are backed in system memory, BLOCK_CLOBBER
can be set.  Rather than paging the block out when it needs to be reclaimed,
the contents will simply be thrown away.  This is similar to the way that
the system memory manger handles read-only memory pages that are backed by a
file (i.e., the TEXT section of an executing program).

On some architectures memory in the AGP aperture can be cached by the CPU.
There may be some other cases where memory that is readable by the graphics

chip (PCI-GART memory?) can be cached.  In these cases BLOCK_CACHABLE is
set.  This can allow direct reading of the memory block in cases where a
software fallback is by the GL.  If the block is not cachable, it may be
better to page the memory back to system memory or modify the AGP mapping
for optimal performance.

enum {
    BLOCK_PIN = (1U << 0),      /** Block is pinned & cannot move. */
    BLOCK_CLOBBER = (1U << 1),  /** Clobber block on reclaim. */
    BLOCK_CACHABLE = (1U << 2), /** Is this memory area cachable by the CPU? */
};

struct memory_block {
   u32    fence;    /** Fence age of the last-to-expire object in the block.*/
   u16    status;   /** Status flag bits. */
   u16    id;       /** ID number of the block. */
   u16    sequence; /** Sequence number of block in group of blocks. */
};

}}}

{{{ 2.1.2 Available block ID list

Each sequence of memory blocks used by a process must have a system unique
block ID.  Block IDs are on the range [0, 0x0ffff].  Available block IDs are
stored in a bit vector.  If a bit in the vector is cleared, that block ID is
available.  If a bit in the vector is set, that block ID is in use by some
process.

To allocate an ID, a process must obtain a lock and search the table for a
cleared bit.  After finding a clear bit, the process sets the bit and
releases the lock.  This should not happen frequently and should be very
fast.  It should also be possible to implement the search using a cmpxchg
instruction on CPUs that support it.

Either implementation suffers from having to perform a linear search each
time an ID is to be allocated.  One way to reduce this cost is to allocate
multiple IDs at a time (perhaps all of the available IDs represented in a
single byte or 32-bit word) and keep the IDs when they are not in use.  The
driver would have to be careful not to keep too many unused IDs or the
system might run out of available IDs.

The bit vector is stored in an 8KB shared memory segment.  It is never
accessed by the kernel.  The bit-vector was chosen because it represents the
65536 possible IDs in a very compact way.

}}}

{{{ 2.1.3 Modified block list

The list of modified blocks is tracked using a simple bit vector.  When a
process reclaims a block that is owned by a different process, the bit for
that block ID is marked in the bit vector.  When a process wants to use a
particular object, it must check the bit in the vector for the memory_block
associated with the object.  If the bit is set, then some other process
reclaimed a portion of the memory associated with that memory_block ID.

There is a subtle detail that needs to be kept clear here.  The bit field
tracks modified memory_blocks by block ID.  Multiple memory_blocks can have
the same block ID.  If the bit for a block ID is set in the vector, that
only means that at least one memory_block has been modified.  Depending on
where individual objects are located in the sequence of blocks and which
blocks in the sequence were reclaimed, the reclaimed blocks may not need to
be brought back in.

To determine which logical block was modified, the process must look at the
memory_block structures for the object.  It is up to the driver to maintain a
mapping of which elements in the array of memory_block structures match a
particular ID.  If the block ID on one of the array elements has changed,
then the block was modified.  It is up to the driver to determine if the
block needs to be paged back in or restored from some other backing store.
In practice, storing the mapping and determining how to restore a block is
handled in device independent user-space code.

}}}

{{{ 2.1.4 Free Memory / Reclaim List
       
The method used to store the free-list can make or break the performance of
any memory management algorithm.  Much research has been done on paging
strategies for virtual memory operating systems.  The architecture of a
processor's memory management unit is similar to what is done here, but
different in a subtle way.  A CPU's MMU can map out individual pages of
memory, and we can map out individual blocks.  However, the CPU's MMU can
replace individual pages at any arbitrary physical address.  We cannot.
Blocks with the same ID must remain in ordered by their sequence numbers.
This allows us to take memory out like paged virtual memory, but we have to
bring the memory back in like segmented virtual memory.

Some tricks can be done with the AGP aperture to work around this, but those
tricks do nothing for on-card memory.  On-card memory is several orders of
magnitude faster than AGP memory, so optimizing the use of on-card memory is
a top priority.  The total void of available papers on this subject and the
need for performance has led to the design of several candidate data
structures.  The intention is to implement and profile each.

There two things that need to be tracked.  The first is the set of
free blocks.  The other is an ordered set of allocated blocks.  The set of
allocated blocks is ordered by reclaim priority.  The allocated block with
this highest reclaim priority is the block that will be first reclaimed when
there are no free blocks available.

{{{ 2.1.4.1 Linked List

}}}

{{{ 2.1.4.2 Linked with Bit-Vector
       
}}}
}}}

{{{ 2.1.5 driMemoryBuffer 

The memory_block structure is used to share information about video memory
between processes.  The driMemoryBuffer structure is used within a process
to track allocated buffers.  Section 2.2.2.1, Internal driver interface,
describes the way that individual drivers interact with this data structure.
All of the code for working with driMemoryBuffer "objects" is device
independent, but is opaque to the driver code.

struct driMemoryBuffer {
    /* The following fields are set when the memory buffer is allocated
     * by a driver (i.e., always).
     */

    struct driMemoryBuffer * next;
    struct driMemoryBuffer * prev;

    size_t           size;     /** Size of the object, in bytes. */
    uint_fast32_t    required_flags; /** Required memory property flags for
                                      * the memory buffer.
                                      */
    uint_fast32_t    optional_flags; /** Optional memory property flags for
                                      * the memory buffer.
                                      */


    /* The following fields are set when the memory buffer is committed
     * to a memory region.
     */

    uint_fast32_t    actual_flags; /** Memory property flags of the memory
                                    * holding the buffer.
                                    */
    uintptr_t        offset;   /** Offset of the buffer within the memory
                                * heap.
                                */
    uint_fast32_t    fence;    /** Fence age of this object. */
    uint_fast16_t    id;       /** ID number of the block containing this
                                * object.
                                */
    uint_fast16_t    status;   /** Status flag bits. */
    uint_fast16_t    block_id; /** Block ID of the memory_block that holds
                                * this buffer.
                                */
    uint_fast8_t     heap;     /** Heap ID where the buffer is located. */
};

The size, required_flags, and optional_flags fields capture the parameters
supplied to driBufferAllocate.  The remaining fields, with the exception of
the status field remain uninitialized until the buffer is committed to a
region of video memory.

Unlike the fields of the memory_block that must be defined in absolute
machine limits, there are no such requirements on the fields of this
structure.  This structure will only be used within a driver, so there will
never be backward compatibility issues.  As such, only lower limits need be
set on the side of the fields.  Each platform is allowed to pick the optimal
data size.

When a block is committed to a memory region, the heap, offset, block_id, and
actual_flags fields are set.  In addition, BLOCK_PIN will be set in status.
The fence will be cleared to zero.

When a driver attempts to commit a driMemoryBuffer (see
driBufferGetCPUAddress and driBufferGetCardAddress in section 2.2.2.1), the
device independent code checks to see if the buffer has been committed before.
This is done by checking that actual_flags satisfies required_flags (i.e.,
(actual_flags & required_flags) == required_flags).  If the block has been
committed, the block_id is used to test the modified block list (see section
2.1.3) to determine if there are any blocks with that block ID that have
been modified.

The actual blocks that make up this object will be tested.  The heap,
offset, and size fields are used to determine which blocks make-up this
object.  If the block IDs on the blocks match the block ID on the object,
then they have not been modified.  If any of the blocks have been modified,
one of two things will happen. If the driMemoryBuffer is not clobberable,
the device independent code will ask the kernel to bring the blocks back in.
If the object was clobberable the device independent code will attempt to
allocate space for the object.

Once the object is committed to memory (either by the object having never
been modified or by the object being replaced in memory), the object and all
of the memory_blocks that make it up will be pinned.

}}}

}}}

{{{ 2.2 Division of labor

One of the design goals of the new memory manager, and the DRI in general,
is to facilitate user upgrades and bi-directional compatibility (i.e.,
forward and backward compatibility).  This design tenet has led to placing
very little decision making logic in the kernel.  Basically, the desire is
to put code in the kernel that can only go in the kernel.  This trend
continues with the new memory manager.  The vast majority of the new
functionality is implemented in the user-space portion of the driver.

{{{ 2.2.1 What happens in the kernel

There are two difficulties with adding new functionality to the kernel.
Adding extra code to the kernel presents portability problems.  It is much
more difficult to port kernel code from Linux to BSD or Solaris or anything
else than it is to port user-space code.  Additionally, changing
functionality in kernel code can present compatibility problems.  These are
both issues that can only present headaches to developers and users alike.
As such, the functionality that is added to the kernel in the new memory
manager is limited to glorified data movement.

{{{ 2.2.1.1 Page out

The primary API exposed to user-space from the kernel is for paging memory
blocks in and out.  When the user-space driver detects that a block of
memory needs to be reclaimed, there are two types of blocks that can be
reclaimed.  Either a throw-away block can be reclaimed or a non throw-away
block.  If a throw-away block is selected, no kernel intervention is
required.  If a non throw-away block is selected, the user-space driver must
ask the kernel to page the block out.

    int drmPageOutMemoryBlock( int fd, unsigned heap_id, unsigned index,
        unsigned block_id, unsigned sequence );

The 'index' is the index of the memory block to page out in the array of
memory blocks.  If the fence for the specified memory has not completed, the
call will wait for the fence to complete.  When a block is paged out, the
bit for its block ID in the modified-block bit vector is set.  The
'block_id' and the 'sequence' are the block ID number and the sequence
number for the data that will be placed in the block.

In addition, the user-space driver can ask the kernel page a memory block
back in.

    int drmPageInMemoryBlock( int fd, unsigned heap_id, unsigned index,
        unsigned block_id, unsigned sequence );

The 'index' is the index of the destination memory block, and 'block_id' and
'sequence' specify the memory block to page back in.

The third entry point is a short-cut that combines the first two.  This
allows a memory block to be paged out and another block paged back in at the
same location.

    int drmPageOutPageInMemoryBlock( int fd, unsigned heap_id, unsigned index,
        unsigned block_id, unsigned sequence );

}}}

{{{ 2.2.1.2 AGP Re-mapping

The kernel is also capable of doing some AGP mapping tricks that allow the
implementation of true single-copy textures.  With the AGPGART 2.0
interface, which should be available in the 2.5.60 or 2.5.61 Linux kernel,
the pages backing portions of the AGP aperture can be changed dynamically.
Using this functionality, the DRM could use the copy of the texture in the
driver as the backing for a portion of the AGP aperture.  If
APPLE_client_storage[2] is used, pages in the application could be used to
back the AGP aperture.

This functionality will likely not be available in the first release of the
new memory manager.

}}}
}}}

{{{ 2.2.2 What happens in user-space

Just like in the current memory manager, the decision making logic happens
in the user-space portion of the driver.  In user-space, the driver handles
requests to allocate and free memory for textures and other data.  The
user-space portion also make policy decisions about which data blocks are to
be paged out of memory to make room for other blocks.  The code that
implements the user-space portion of the memory manager is device
independent and lives in lib/GL/mesa/src/drv/common/.

{{{ 2.2.2.1 Internal driver interface

Since the memory manager is implemented in a device independent way, there
is a single API presented by the memory manager to the driver.  This
interface allows driver to allocate and free buffer and control if and when
the buffers can be reclaimed by other processes.  The API intentionally does
not expose any of the data structures from section 2.1 to the driver.

Each memory pool (i.e., on-card memory, AGP memory, etc.) has an associated
set of attributes.

enum driMemoryBufferFlags {
    BUFFLAG_COLOR_BUFFER   = (1U <<  0), /** Buffer can be used as a color
                                          * buffer destination for rendering.
                                          */
    BUFFLAG_DEPTH_BUFFER   = (1U <<  1), /** Buffer can be used as a depth
                                          * buffer destination for rendering.
                                          */
    BUFFLAG_STENCIL_BUFFER = (1U <<  2), /** Buffer can be used as a stencil
                                          * buffer destination for rendering.
                                          */
    BUFFLAG_TEXTURE        = (1U <<  3), /** Buffer can be used to source
                                          * texture data.
                                          */
    BUFFLAG_VERTEX         = (1U <<  4), /** Buffer can be used to source
                                          * vertex data.
                                          */
    BUFFLAG_COMMAND        = (1U <<  5), /** Buffer can be used to source
                                          * rendering commands.
                                          */
    BUFFLAG_CACHABLE       = (1U <<  6), /** Buffer is cachable by the CPU.
                                          */
    BUFFLAG_RESERVED0      = (1U <<  7), /** Reserved for future use.
                                          */
    BUFFLAG_DRIVER0        = (1U <<  8), /** Private flag usable by the
                                          * driver.
                                          */
    BUFFLAG_DRIVER1        = (1U <<  9), /** Private flag usable by the
                                          * driver.
                                          */
    BUFFLAG_DRIVER2        = (1U << 10), /** Private flag usable by the
                                          * driver.
                                          */
    BUFFLAG_DRIVER3        = (1U << 11), /** Private flag usable by the
                                          * driver.
                                          */
};

The driBufferAllocate function is used to allocate a buffer in memory.  When
the buffer is no longer needed, driBufferRelease is used to free the memory.
If a buffer is released that is pinned, it is unpinned.  If a buffer is
released that has a fence set that has not completed, the buffer will be
queued for deletion.  As soon as the fence is complete, the buffer will be
released.  The idea is that both of these functions are cheap to call.  In
most cases it will be better for a driver to throw away a buffer and
allocate a new one than to wait for the fence on the existing buffer.

    int driBufferAllocate( struct driMemoryBuffer ** buf, unsigned size,
                           unsigned required_flags, unsigned optional_flags,
                           unsigned * actual_flags );
    int driBufferRelease( struct driMemoryBuffer * buf );

Simply allocating a buffer may or may not reserve space for the buffer.
When driBufferGetCPUAddress or driBufferGetCardAddress is called, the buffer
will be committed to memory and will be pinned.  It is very important that
the memory be pinned at this point.  If it were not pinned, another process
could reclaim the memory before the allocating process would have a chance
to use the memory.

    int driBufferGetCPUAddress( struct driMemoryBuffer * buf, void ** addr );
    int driBufferGetCardAddress( struct driMemoryBuffer * buf, void ** addr );

Once the address of the buffer is fixed, the driver can use an appropriate
mechanism to load data into the buffer or read data from the buffer.

Most buffers do not need to be saved when they are reclaimed by another
process.  Some buffers, such as textures that have been modified by
glCopyTexImage, must be saved.  The driBufferCanClobber function is used to
set this state.  By default all buffers are marked as clobberable.  If a
buffer cannot be clobber, it should be marked as such by calling
driBufferCanClobber with can_clobber set to GL_FALSE.

    int driBufferCanClobber( struct driMemoryBuffer * buf, GLboolean
                             can_clobber );

When a buffer is submitted to the hardware for rendering, a fence needs to
be set on it.  By examining the fence on a buffer, the driver can know if
the buffer is being used by the hardware.  Along with with
driBufferSetFence, two functions exist to test a fence for completion or
wait for a fence to complete.

    int driBufferSetFence( struct driMemoryBuffer * buf, unsigned fence );
    int driBufferWaitFence( struct driMemoryBuffer * buf );
    GLboolean driBufferTestFence( struct driMemoryBuffer * buf );

Properly setting fences is very important.  Usage statistics for a buffer
are updated each time a fence is set.  These statistics may be used to
determine which buffers are reclaimed when additional memory is needed.  For
example, if buffers are reclaimed on an LRU or MRU basis, setting a fence is
the only way to update the buffers usage timestamp.

After a fence has been set for a buffer, it can typically be unpinned.  The
fence is generally enough of a synchronization method.

    int driBufferPin( struct driMemoryBuffer * buf );
    int driBufferUnpin( struct driMemoryBuffer * buf );

The function driBufferDataLost is used to determine if the data associated
with a buffer has been lost.  At any point in time a buffer can be in one of
four states.  It can never have been committed (i.e., by calling either
driBufferGetCPUAddress or driBufferGetCardAddress), clobbered when another
process reclaimed the memory, paged out (but restorable by the kernel), or
still in memory (and unmodified).  In the first two cases the driver needs
to commit the buffer and load data, and driBufferWasLost will return
GL_TRUE.  In the other two cases the driver need either only commit the
buffer, and driBufferWasLost will return GL_FALSE.

    GLboolean driBufferWasLost( struct driMemoryBuffer * buf );

Each driver will need to have a wrapper around _tnl_run_pipeline.  At a
minimum, the wrapper needs to call driBufferWasLost on each buffer that will
be used.  All buffers that will be used need to be pinned.  After calling
_tnl_run_pipeline, a fence needs to be set on each buffer, and buffers that
do not need to be pinned should be unpinned.

At the end of each frame (i.e., when glXSwapBuffers is called),
driBufferPoolEndFrame should be called.  Depending on the policies used to
reclaim allocated memory, driBufferPoolEndFrame can help track per-frame
buffer usage statistics.

    int driBufferPoolEndFrame( unsigned pool_id );

}}}

{{{ 2.2.2.2 Creating memory pools

The actual work to create the memory pools happens in the kernel.  There is
a certain amount of work done in user-space to determine what pools are
available.  Most AGP graphics cards will have three memory pools.

   1. On-card memory.  This pool can typically hold any type of buffer.
   2. AGP aperture.  This pool can usually only hold texture data, vertex
      data, and command buffers.
   3. PCI DMA memory.  This pool can typically only hold command buffers,
      but may be able to hold other types of buffers in emergency,
      low-memory situations.

[Most of the rest of this section depends on the resolution to issue #3.13.]

}}}

{{{ 2.2.2.3 Reclaiming used blocks

Further discussion is required before this section can be written.

}}}

}}}

}}}

}}}

{{{ 3. Open Issues

3.1 How do we determine which blocks to reclaim? (open)

3.2 Are the sizes of the status, id, and sequence fields for 'struct
     memory_block' adequate? (open)

3.3 How does a process map from a memory address to a block ID? (open)

3.4 How are block IDs reclaimed if a process crashes? (open)

3.5 What happens when all the block IDs are exhausted? (open)

3.6 Should the kernel calls to page blocks in / out work on a single block
     at a time, or should they take a list of blocks?  If they take a list
     of blocks, should it be required to be a linear list (i.e., page out N
     blocks, starting with block M)?

3.7 There seem to be race conditions in assigning fences, paging blocks in
     / out, and copying in data. (closed)
     
The race I envision is that paging out blocks, copying in data, and
assigning a fence number is a multi-step process.  The first part of the
process happens without any locks held. Moreover, the first step can block.
What happens if another process becomes active between the return from
drmPageOutMemoryBlock and the call to copy data into the memory?

Solution: Whenever blocks are paged in from backing store or initially
committed, they are pinned.  This gives the driver some time to set a fence
without fear of the blocks being stolen.  The problem with this solution is
that it exacerbates issue 3.15.

3.8 What interfaces are needed to create memory heaps? (open)

3.9 What interfaces are needed to copy data around (i.e., from main memory
    to on-card, AGP to on-card, on-card to AGP, etc.)? (open)

3.10 Fencing (open)

There is quite a bit of discussion of fences to protect memory blocks.  What
additional support would be required for NV_fence or APPLE_fence?  Is it
worth the effort to design in that extra support now?

3.11 Where should the code live? (open)

If the memory manager is truly device independent, should it live somewhere
other than in the lib/GL/mesa/ subtree?  If the code might be used by other
driver vendors (i.e., binary-only drivers based on DRI), should it live in
lib/GL/dri/?

3.12 Can buffer attributes change? (open)

Should it be possible to change the attributes of a buffer?  It might be
useful to allocate a pbuffer with (BUFFLAG_COLOR_BUFFER | BUFFLAG_TEXTURE)
initially, but change it to (BUFFLAG_TEXTURE) when it is bound to a texture.
If this is allowed, what sort of API should be used to control it?

3.13 Maximum texture size (open)

The current memory manager calculates the maximum texture size (as returned
from GL_MAX_TEXTURE_SIZE) based on the amount of available texture memory
and the number of texture units.  The size is the maximum size that could be
bound to all texture units and still fit into available texture memory.
With the new memory manager, this system will not work.

    1. Non-texture data can be put in "texture" memory and some buffers can
       be pinned, making it is impossible to calculate, in advance, the
       amount of available texture memory.
    2. Calculating the maximum texture size in advance fails to consider the
       space savings from using compressed textures.
    3. It is unlikely that cards that can use many textures per pass (the
       R200 can use 6) will have the maximum number of maximum size textures
       bound at once.  This can artificially limit the maximum size of
       textures that can be used on an texture unit.  On a 64MB PCI R200, I
       believe the maximum sized cube map, with full mipmaps, would be
       256x256.

There are three possible solutions that I can see.

    1. Continue using the existing technique for calculating the maximum
       texture size.  There are still cases where we could fail to bind
       textures to all available units.
    2. Advertise the maximum hardware supported limit, but fallback to
       software rendering when all textures (and vertex buffers) cannot fit
       in memory.
    3. Advertise the maximum hardware supported limit, but silently reduce
       the LOD on larger textures that will not fit.  This will only help
       with mipmapped textures.  We could also fallback to software TCL to
       remove vertex data from memory.

3.14 Interaction with XAA (open)

What would be required to make the memory manager usable by the rest of
XFree86 for allocating pixmaps and the display buffer?

3.15 How are pinned blocks reclaimed when a process dies? (open)

Each process can have numerous block pinned.  These can either be
framebuffer type objects or other objects that are temporarily pinned so
that a fence can be set.  The whole point of pinning an object is to prevent
it from being reclaimed.  However, if the process dies without releasing the
blocks, the system must reclaim them.

3.16 Wrapping _tnl_run_pipeline may not work (open)

Keith Whitwell pointed out that wrapping _tnl_run_pipeline (section
2.2.2.1), may not work.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg09119.html

}}}

{{{ 4. Revision History

0.1 (2003/02/06) Initial version for internal pre-review.
0.2 (2003/02/07) Added BUFFLAG_RESERVED0 to remove the gap between
                 BUFFLAG_CACHABLE and BUFFLAG_DRIVER0.
                 Fixed numerous simple errors.
                 Expanded the introduction to '2.1.1 Memory block'.
                 Expanded '2.1.2 Available block ID list'.
                 Expanded '2.1.3 Modified block list'.
0.3 (2003/02/07) Updated the description of the return codes for
                 driBufferWasLost.
                 Initial posting to dri-devel.
0.4 (2003/02/24) Added issue '3.14 Interaction with XAA'.
                 Added issue '3.15 How are pinned blocks reclaimed when a
                 process dies?'
                 Added issue '3.16 Wrapping _tnl_run_pipeline may not work'.
                 Added section '2.1.5 driMemoryBuffer'.
                 Fixed a couple typographical errors.
}}}

{{{ 5. References

[1] "Streaming video through glTexSubImage2D," 
http://www.mail-archive.com/dri-devel%40lists.sourceforge.net/msg08875.html
[2] "APPLE_client_storage," 
http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt
[3] "ARB_pixel_buffer_object," This specification is not yet available.
}}}

[Dri-devel] Updated texmem-0-0-2 design document (24-Feb-03)

Reply via email to