On Tue, 2009-05-26 at 01:26 +0200, Jerome Glisse wrote:
> On Mon, 2009-05-25 at 11:02 -0400, Owen Taylor wrote:
> [...]
> > > Anyway i think the plan for newttm is to use such page allocator so
> > > we can avoid changing cache status of page on every allocation, also
> > > we would like to avoid zeroing page more than necessary for this we
> > > need that the userspace keep the buffer around.
> >
> > Hmm, isn't the problem with that approach knowing when the kernel is
> > done with buffers so they are ready for reuse? You'd need some way to
> > query the kernel to find out what command buffers have completed.
>
> We do have such interface bo busy ioctl, it stills need work thought.
>
> > > So in the end the call to drm_ttm_set_cahing (or equivalent in newttm)
> > > should happen only once at buffer creation.
> > >
> > > Btw the limit on radeon for vertex buffer is quite high:
> > > max number of vertices * max stride = 65536 * (128 * 4) = 32M but i
> > > think it's unlikely to have a stride 128 dwords, common case would
> > > around 32 dwords i think.
> >
> > Not quite sure what you are referring to - the DMA buffer seems to be
> > sized assuming a stride of 4 dwords - It's just 16 * 65536.
> >
> > [ Even 32 dwords seems very high to me - for example the most
> > complicated format supported by glInterleavedArrays - GL_T4F_C4F_N3F_V4F
> > - is just 15 dwords. ]
> >
> > - Owen
>
> I did take this number from specifications, anyway i don't think
> anythings use that much geometry in one run
I'm still a little confused.
MAX_DMA_BUF_SZ isn't actually the maximum anything ... it's just a
number that is used for the default size the allocated buffers we
allocate. (Multiplied by 16)
If radeonAllocDmaRegion() is called with a value of bytes that is
greater than 16*MAX_DMA_BUF_SZ, the code will just allocate a new buffer
of that size and use all of it.
> and we should really
> be more clever about vertex buffer, note that we can avoid calling
> dmarelease on cmdbuf flush but problem is that right i believe we
> don't accept to map a bo if it's in use by gpu while clearly here
> what we want to do is having a gpu read from one part of the bo
> while the cpu is writting to another. I will think a bit about that,
> but i think the question here is do we want to allow that ? ie
> to assume that userspace will ask kernel to do proper caching
> operation when needed, i think it boils down to ask the userspace
> to be in charge of cache coherency decision and to tell to the
> kernel what to do.
The kernel certainly has more flexibility for memory management if we
unmap the buffer first, but is that flexibility useful?
We could map the buffer in the LOCAL domain initially instead of the TT
domain:
AGP - CPU could write to the buffer faster (*), but changing the
pages to uncached is slow (where this conversation started)
PCIE - CPU writes to a buffer in the TT domain are already cached
Other than that, I don't see any advantage. For one-used data, having
the GPU read it out of system memory is about as good as you can get.
- Owen
(*) There's probably room for use of 64 or 128-bit SSE writes to make
the current uncached case faster, but then again, it's only a problem
for AGP...
------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev