On Wed, Mar 24, 2010 at 9:20 PM, Luca Barbieri <l...@luca-barbieri.com> wrote:
> Thanks for providing a long insightful reply.
>
>>> Transfers can then be split in "texture transfers" and "buffer transfers".
>>> Note that they are often inherently different, since one often uses
>>> memcpy-like GPU functionality, and the other often uses 2D blitter or
>>> 3D engine functionality (and needs to worry about swizzling or tiling)
>>> Thus, they are probably better split and not unified.
>>
>> My experience is that there is more in common than different about the
>> paths.  There are the same set of constraints about not wanting to
>> stall the GPU by mapping the underlying storage directly if it is
>> still in flight, and allocating a dma buffer for the upload if it is.
>> There will always be some differences, but probably no more than the
>> differences between uploading to eg a constant buffer and a vertex
>> buffer, or uploading to a swizzled and linear texture.
>
> The considerations you mentioned are indeed common between buffers and
> textures, but the actual mechanisms for performing the copy are often
> significantly different.
>
> For instance, r300g ends up calling the 3D engine via
> surface_copy->util_blitter for texture transfers, which I suppose it
> wouldn't do for buffer transfers.
>
> nv30/nv40 don't have a single way to deal with swizzled textures, and
> the driver must choose between many paths depending on whether  the
> source/destination is swizzled or not, a 3D texture or not, and even
> its alignment or pitch (the current driver doesn't do fully that, and
> is partially broken for this reason).
> Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT.
>
> nv50 does indeed have a common copy functionality that can handle all
> buffers and textures in a unified way (implemented as a revamped
> MEMORY_TO_MEMORY_FORMAT).
> However, an additional buffer-only path would surely be faster than
> going through the common texture path.
> In particular, for buffers tile_flags are always 0 and height is
> always 1, allowing to write a significantly simplified buffer-only
> version of nv50_transfer_rect_m2mf with no branches and no
> multiplications at all.
>
> In other words, I think most drivers would be better off implementing
> unified transfers with an "if" switching between a buffer and a
> texture path, so it may be worth using two interfaces.
>
> Also note that a buffer-only interface is significantly simplified
> since you don't need to specify:
> - face
> - level
> - zslice
> - y
> - height
> - z
> - depth
> - stride
> - slice stride
>
> While this may seem a micro-optimization, note that 3D applications
> often spend all the time running the OpenGL driver and Mesa/Gallium
> functions are already too heavy in profiles, so I think it's important
> to always keep CPU performance in mind.
>
> The code is also streamlined and easier to follow if it does not have
> to default-initialize a lot of stuff.
>
> An utility function calling the right interface can be created for
> state trackers that really need it (maybe Direct3D10, if the driver
> interface follows the user API).

I take your point, though I should point out you've double-counted z
and zslice, and face+level are one dword.

To me this speaks to another aspect of the gallium interface which is
a bit odd -- in particular the way several of our interfaces basically
copy their inputs into a structure and pass that back to the state
tracker.  Why are we doing that?  The state tracker already knows what
it asked us to do, and there is no reason to assume that it needs us
to re-present that information back to it.

The only really new information provided by the driver to the state
tracker by transfer_create + transfer_map is:
- the pointer to the data
- stride
- slice stride

If the transfer functions ended up just passing this data back, it
would reduce the overhead across the board.

Your point is still valid that the last two will be zero for buffer
transfers, though.

>
>> In DX they have
>> different nomenclature for this - the graphics API level entities are
>> resources and the underlying VMM buffers are labelled as allocations.
>> In gallium, we're exposing the resource concept, but allocations are
>> driver-internal entities, usually called winsys_buffers, or some
>> similar name.
>
> D3D10 uses buffers, sampler views and render target views as entities
> bindable to the pipeline, and the latter are constructed over either
> textures or buffers.
> Note however, that the "description structure" is actually different
> in the buffer and texture cases.
>
> For render target views, they are respectively D3D10_BUFFER_RTV and
> D3D10_TEX2D_RTV (and others for other texture types).
> The first specifies an offset and stride, while the second specifies a
> mipmap level.
> Other views have similar behavior.

> Buffers are directly used in the interfaces that allow binding
> vertex/index/constant buffers.
>
> Both buffers and textures are subclasses of ID3D10Resource, which is
> used by CopyResource, CopySubresourceRegion and UpdateSubresource,
> which provide a subset of the Gallium transfer functionality in
> gallium-resources.
>
> Note however that the two resources specified to CopyResource and
> CopySubresourceRegion must be of the same type.
>
> So in summary, D3D10 does indeed in some sense go in the
> buffer/texture unification, but with some important differences:
> 1. Buffers and textures still exists as separate types. Note that
> there is no "texture" type, but rather a separate interface for each
> texture type, which directly inherits from ID3D10Resource
> 2. Textures are never used directly by the pipeline, but rather
> through "views" which have texture-type-specific creation methods and
> have separate interfaces
> 3. Buffers are directly used by the pipeline for vertex, index and
> constant buffers
> 4. Resources are used in copying and transfer functionality
> 5. D3D10 has a more memory-centric view of resources, providing for
> instance a D3D10_USAGE_STAGING flag, for "A resource that supports
> data transfer (copy) from the GPU to the CPU."
>
> D3D11 seems to be unchanged in this respect.
>
> So, if we want to go on a DX10-like route, how about:
> 1. Keeping pipe_buffer and pipe_texture, perhaps with a "pipe_resource
> base;" field
> 2. Considering splitting pipe_texture into pipe_texture_2d,
> pipe_texture_3d, pipe_texture_2d_array, etc.
> 3. Adding render target views and depth/stencil views, and making
> those constructible over buffers
> 4. Having equivalent transfer mechanisms for buffers and textures, but
> not necessarily unified in a single function
> 5. Eliminating the concept of pipe_surface, in favor of render target
> views and explicit subresources in transfer functionality
>
> D3D10/11 do not provide a transfer concept, but rather only
> inline_write/copy mechanisms.
> They also provide D3D10_USAGE_STAGING resources, which can be used as
> transfers with explicit copy operations.
> Resource copying/updating functionality is indeed unified between
> buffers and textures (using a "box" structure like gallium-resources
> does).
>
> As for the transfer unification, it seems to me they are better kept
> split, following OpenGL, but it may indeed not be clear without more
> driver experience.
>
> A possible middle ground, given the current status of
> gallium-resources, could be to keep buffer-specific and
> texture-specific utility functions for state trackers calling a common
> interface, and using them where possible.
>
> If it turns out that we are very often communicating between a
> buffer/texture-specific state tracker interface and a
> buffer/texture-specific driver code (using the vtbl utilities), using
> an inefficient common interface, it is then easy to directly bridge
> them later by splitting the Gallium interface.
>
> Also, once we have drivers actually supporting efficient memory
> management (as opposed to the current situation where Radeon and
> GeForce drivers directly use kernel buffer objects, with terrible
> performance, and often not paying attention to uncached memory issues,
> especially for buffers), it may also be clearer whether transfers are
> a good interface, or should be replaced with user/"staging" buffers
> and user/"staging" textures with copies (like D3D10 does with
> D3D10_USAGE_STAGING)
>

Luca,

Thanks for the summary.  I'd add that there is also some information
available publicly about the D3D10 DDI, which follows a slightly
different interface to the API.  In that world, there is a single
create resource function:

http://msdn.microsoft.com/en-us/library/aa478785.aspx

and most functions with texture or buffer arguments are provided with
Resource handles, eg:

http://msdn.microsoft.com/en-us/library/aa478810.aspx

There is however clearly concern about the possible need for
specialized transfer mechanisms for particular buffer types.  It seems
like they've taken an approach that leaves the choice to the driver
whether to specialize or not -- basically there are a number of
specialized map/unmap entrypoints, but all with the same function
prototype so that a driver could if it wanted to point them all to a
single generic implementation, or if it preferred, provide specialized
implementations for some of them.  There is some discussion of these
choices in the page below:

http://msdn.microsoft.com/en-us/library/aa478736.aspx

In terms of moving forward, I think your proposed middle ground is a
valid approach.  I don't view gallium-resources as the final word on
the subject, but rather a big step in the right direction in
particular to clean up the confusion over what a pipe_buffer really is
and what it is not, and to nudge drivers towards thinking about
asynchronous transfers for buffers and textures as being variations on
a common theme, rather than fundamentally disjoint operations.

I'm really keen to get gallium-resources merged - probably combined
with the buffer_usage_cleanup branch.  I suspect there are some
lingering bugs in -resources that are addressed by the cleanup branch.
 Have you had a chance to do any testing of the changes I made on
-resources or -cleanup?

Keith

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to