RE: [Dri-devel] The next round of texture memory management...

Jeff Hartmann Fri, 17 Jan 2003 11:09:35 -0800

Ian,
        I've looked through your general proposal and it looks really good.  Here
are some implementation things I've been thinking about.


> That may not be possible.  Right now the blocks are tracked in the
> SAREA, and that puts an upper limit on the number of block available.
> On a 64MB memory region, the current memory manager ends up with 64KB
> blocks, IIRC.  As memories get bigger (both on-card and AGP apertures),
> the blocks will get bigger.  Also right now each block only requires 4
> bytes in the SAREA.  Any changes that would be made for a new memory
> manager would make each block require more space, thereby reducing the
> number of blocks that could fit in the SAREA.

> Even if we increase the size of the SAREA, a system with 128MB of
> on-card memory and 128MB AGP aperture would require ~65000 blocks (if
> each block covered 4KB).

        Don't worry too much about this, we can create an entirely new SAREA to
hold the memory manager.  It can also be rather large, I'm thinking about
128KB or so wouldn't be a problem at all.  This will be non swappable
memory, but thats not too big a deal.  Here is what I'm thinking of as the
general block format right now, it might not be perfect:

#define BLOCK_CAN_SWAP                          (1<<0)
#define BLOCK_LINKS_TO_NEXT                     (1<<1)
#define BLOCK_CAN_BE_CLOBBERED          (1<<2)
#define BLOCK_IS_CACHABLE                       (1<<3)
#define BLOCK_LOG2_USAGE_MASK                   ((1<<4)|(1<<5)|(1<<6)|(1<<7))
#define BLOCK_LOG2_USAGE_SHIFT          (4)
#define GET_BLOCK_LOG2_USAGE(status)    ((((status) & BLOCK_LOG2_USAGE_MASK) >>
BLOCK_LOG2_USAGE_SHIFT) + 1)
#define PACK_BLOCK_LOG2_USAGE(log2)     (((log2 - 1) << BLOCK_LOG2_USAGE_SHIFT)
& BLOCK_LOG2_USAGE_MASK)
#define BLOCK_ID_SHIFT                          8
#define BLOCK_ID_MASK
((1<<27)|(1<<26)|(1<<25)|(1<<24)|(1<<23)|(1<<22)|(1<<21)|(1<<20)|(1<<19)|(1<
<18)|(1<<17)|(1<<16)|(1<<15)|(1<<14)|(1<<13)|(1<<12)|(1<<11)|(1<<10)|(1<<9)|
(1<<8))
#define PACK_BLOCK_ID(x)                        ((x << BLOCK_ID_SHIFT) & BLOCK_ID_MASK)
struct memory_block {
        u32     age_variable;
        u32     status;
};

        Where the age variable is device dependant, but I would imagine in most
cases is a monotonically increasing unsigned 32-bit number.  There needs to
be a device driver function to check if an age has happened on the hardware.

        The status variable has some room, only the bottom 28-bits are defined at
the moment.  The first 4 bits are some status bits.  If BLOCK_CAN_SWAP is
set, we can swap this block, swapping requires the driver to call the kernel
to swap out this block using some agp method where the contents are
preserved.  Can be accomplished by card DMA.  If BLOCK_LINKS_TO_NEXT is set
we are part of a group of blocks, which must be treated as a unit.  If
BLOCK_CAN_BE_CLOBBERED is set, the driver can just overwrite this block of
memory.  If BLOCK_IS_CACHABLE is set we can readback from this block in a
fast way, so fallbacks can directly use this block.  The BLOCK_LOG2 stuff is
a way to pack the usage of this block of memory in just a few bits.  We pack
log2 - 1, where we only accept usages of 2 bytes or more.  Using 2 bytes
could be considered empty.  We can store upto block usage sizes of 64k in
this manner.  I think that we want 64kb to be our maximum size for a block.

        The bits 27:8 would be a 20-bit number representing a block id.  Each one
would be unique, so the driver could keep track of what blocks represent a
texture.  A 20-bit number should be sufficent, since that gives us like 2
million values to work with.

        This is a pretty good start for a block format I think.  We want to make
the memory management SAREA have a lock of its own, shouldn't be a big deal
to extend the drm to provide us with one.  Or perhaps we use the normal
device lock when we do any management, I haven't decided yet.  There are
some issues to really think about here.

        This sort of implementation needs the kernel to be able to swap out a block
from agp memory.  The kernel should reserve a portion of the agp aperture
for this purpose.  Probably on the order of 2-4 MB.  Each allocation of the
agp aperture should be no smaller then 1MB in size, to prevent agpgart from
having to deal with too many blocks of memory.  It will also have to be no
smaller then the agp_page_shift, in case someone is using 4MB agp pages.
The kernel will blit with a card specified function the designated block
from its current position to its final position in the block of agp memory
to be swapped.  When the ENTIRE block is full, then the kernel will call
agpgart to swap that region out of the agp aperture.  The kernel will keep
track of what each swapped out block contains in some manner, or might brute
force scan the shared memory area containing the swapped out blocks.

        There will be a non backed shared memory area that contains all the swapped
out pages, the swapped pool it probably a good thing to call it.  Basically
its a shared memory area, of say 1MB in size that doesn't have any pages
backing it.  It will have a kernel no page function that populates it if
needed.  Basically it will only have information in it if things are swapped
out of the aperture.

        There needs to be a kernel function which moves a block of memory into
cacheable space.  We could do with with PCI dma, or some magic conversion of
unbound agp pages.  This could be made safe, and wouldn't be a big deal with
the new agpgart vm stuff.  That way the block of agp memory could be
accessed by a fallback or some other function that needs to directly read
the texture.  Readback from normal agp memory is horrible, something on the
order of 60MB/sec.

        Those are my implementation thoughts, pretty much a rehash of some of the
things I wrote about the subject while at VA Linux.  Feel free to poke holes
through everything and make recommendations on design.  I think this sort of
direction should do what we need, but might need plenty of revision.

Cheers,
-Jeff



-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will
allow you to extend the highest allowed 128 bit encryption to all your 
clients even if they use browsers that are limited to 40 bit encryption. 
Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

RE: [Dri-devel] The next round of texture memory management...

Reply via email to