On Mon, 2009-05-25 at 10:33 +0200, Jerome Glisse wrote: 
> On Sun, 2009-05-24 at 11:44 -0400, Owen Taylor wrote:
>  
> > It turns out that allocating the DMA buffer is actually pretty
> > expensive, and that the expense is proportional to the size of the
> > buffer.
> > 
> > On my system, allocating a 1M buffer was taking about 1.5 milliseconds.
> > (1.7GHz Pentium M).
> > 
> > http://fishsoup.net/images/radeon-dma-buffer-profile.png
> > 
> > Shows a profile of approximately where the time is going. (This was
> > after reducing the buffer size from 1M to 256k - the proportion of
> > overall time spent in this operation was considerably greater 
> > with the 1M DMA buffers, but the relative times are the same.)
> > 
> > I think the portion under set_memory_wc() is likely optimizable by doing
> > some of the work *once* instead of per-page. (flush_all_zero_pkmaps ends
> > up being called once per page.)

[...]

> Thanks for looking into this, you have done your testing with rawhide
> (f11 kernel) so it should use Dave page allocator which have a pool
> of uc page so which should avoid to routinely call set_memory_uc and
> thus it shouldn't call set_memory_uc that much maybe somethings is
> wrong in the path which should call this allocator.

OK, with that hint the convolutions became a little bit more clear. The
exact problem is:

drm_buffer_object_create() is called with flags=DRM_BO_FLAG_MEM_TT
(DRM_BO_FLAG_CACHED is not included because this is a AGP system)
   
  bo->mem.flags = BO_FLAG_MEM_LOCAL | ...
  bo->mem.proposed_flags = BO_FLAG_MEM_TT | ...

  drm_bo_do_validate()
   drm_buffer_object_validate(..., move_buffer=TRUE)
    drm_bo_move_buffer()
      drm_bo_add_ttm()
        drm_ttm_create(page_flags=DRM_BO_FLAG_WRITE)
        drm_ttm_bind() 
          drm_ttm_set_caching(noncached=DRM_TTM_PAGE_UNCACHED, 
alloc_cached=FALSE)
            drm_ttm_populate(cached=FALSE)
              Allocates uncached memory from pool
            proceeds to mark memory uncached, expensively

The problem was introduced with:

http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=37136c81b28e45f30c8d45d212315c0b4ce6c6b0

(I'm unclear what the "upstream" referred to in the commit message is, maybe
the original had a more informative commit message...)

Prior to that set_caching would be called with no pages allocated, which is
a no-op,and then the pages allocated uncached. With that change, the above
problem occurs.

[ I was thinking I hadn't seen this pathology in profiling with a slightly
  older kernel... ]

It looks quite plausible that reversing the order of:

        if ((ttm->page_flags & DRM_TTM_PAGE_UNCACHED) == noncached)
                return 0;

        {
                ret = drm_ttm_populate(ttm, alloc_cached);
                if (ret != 0)
                        return ret;
        }

So that we do the shortcut if we allocate new memory of the right
cachedness is the right thing to do.
  
But I'd be a lot more confident of that if I had any understanding of
the commit above.

> Anyway i think the plan for newttm is to use such page allocator so
> we can avoid changing cache status of page on every allocation, also
> we would like to avoid zeroing page more than necessary for this we
> need that the userspace keep the buffer around.

Hmm, isn't the problem with that approach knowing when the kernel is
done with buffers so they are ready for reuse? You'd need some way to
query the kernel to find out what command buffers have completed.

> So in the end the call to drm_ttm_set_cahing (or equivalent in newttm)
> should happen only once at buffer creation.
> 
> Btw the limit on radeon for vertex buffer is quite high:
> max number of vertices * max stride = 65536 * (128 * 4) = 32M but i
> think it's unlikely to have a stride 128 dwords, common case would
> around 32 dwords i think.

Not quite sure what you are referring to - the DMA buffer seems to be
sized assuming a stride of 4 dwords - It's just 16 * 65536.

[ Even 32 dwords seems very high to me - for example the most
complicated format supported by glInterleavedArrays - GL_T4F_C4F_N3F_V4F
-  is just 15 dwords. ]

- Owen



------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to