I want to warm this up again adding nvc0 and
GL_ARB_separate_shader_objects to the picture.

The latter extends GL_EXT_separate_shader_objects to support user
defined varyings and guarantees well defined behaviour only if
- varyings are declared inside the gl_PerVertex/gl_PerFragment block the
blocks match exactly in name, type, qualification, and (most
significantly) declaration order.
- varyings are assigned matching location qualifiers:
like: layout(location = 3) in vec4 normal
"The number of input locations available to a shader is limited."

So, I propose to (loosely) identify GENERIC semantic indices with these
location qualifiers and let the pipe driver set a limit on the allowed
maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least
support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs).

My motivation is mostly that the hardware routing table for shader
varyings that was present on nv50 has been removed with nvc0 (Fermi).
And I'm glad, because filling 4 routing tables (since we have 5 shader
types now) is somewhat annoying. And so applying relocations to shaders
- it can be done, it's probably not too time consuming, but it's just
plain *unnecessary* (and thus stupid) for OpenGL.

Now about d3d9 ...
1. don't care, I don't see a d3d9 state tracker
2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx
says "n is an optional integer between 0 and the number of resources
supported" - what "supported" means here isn't clear to me, but, I
didn't find any example where someone used something OpenGL doesn't have
(like COLOR2).
3.
http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics
says "Input semantics are similar to the values in the D3DDECLUSAGE."
and
DECLUSAGE sounds like you're limited to sane values.

Not sure if anyone wants to think about this issue at this time (since
implementation of ARB_separate_shader_objects is probably far in the GL4
future), but I'd be happy about any comments.

Regards,
Christoph

On 04/13/2010 12:55 PM, Luca Barbieri wrote:
> This patch series is intended to resolve the issue of semantic-based shader 
> linkage in Gallium.
> It can also be found in the RFC-gallium-semantics branch.
> 
> It does not change the current Gallium design, but rather formalizes some 
> limitations to it, and provides infrastructure to implement this model more 
> easily in drivers, along with a full nv30/nv40 implementation.
> 
> These limitations are added to allow an efficient implementation for both 
> hardware lacking special support and hardware having support but also special 
> constraints.
> 
> Note that this does NOT resolve all issues, and there are quite a bit left to 
> future refinement.
> 
> In particular, the following issues are still open:
> 1. COLOR clamping (and floating point framebuffers)
> 2. A linkage table CSO allowing to specify non-identity linkage
> 3. BCOLOR/FACE-related issues
> 4. Adding a cap to inform the state tracker that more than 219 generic 
> indices are provided
> 
> This topic was already very extensively discussed.
> See 
> http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg10865.html 
> for some early inconclusive discussion around an early implementation that 
> modified the GLSL linker (which is NOT being proposed here)
> See 
> http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12016.html 
> for some more discussion that seemed to mostly reach a consensus over the 
> approach proposed here.
> See in particular 
> http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12041.html .
> 
> That said, I'm going to try to repeat all information here, partially by 
> copy&pasting from earlier messages.
> This message should probably be adapted into gallium/docs if/when this is 
> accepted.
> 
> Here is the short summary; the long rationale follows after it.
> 
> The proposal here is to add the following limitations to Gallium, for the 
> intermediate semantics:
> 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was 
> never merged
> 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with 
> semantic index 0
> 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this 
> doesn't apply to fragment outputs)
> 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR 
> is not used
> 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR 
> IS used
> 6. GENERIC can be used with semantic indices 0-255 on almost all drivers 
> (those that don't need the 0-218 limitation)
> 7. Some drivers may also choose to support GENERIC with arbitrary indices, 
> but that should generally not happen
> 
> The reason of this, in short, is that this maps directly to DirectX 9 SM3, 
> which is the most problematic interface of all.
> 
> The peculiar problem we have here is that we have two competing constraints 
> that force us into choosing the exact SM3 value:
> 1. The VMware SVGA driver must deal with an SM3 host interface and would 
> ideally want to directly feed the Gallium semantics to the host
> 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would 
> ideally want to directly feed the SM3 semantics to Gallium
> 
> Note that this is not a reference to the VMware DirectX 9 state tracker, 
> since its authors haven't provided details about its handling of shader 
> semantics.
> 
> SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus 
> POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that 
> wouldn't be mapped to GENERIC.
> However, Gallium drivers that don't benefit from having specific contraints 
> (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 
> work does that.
> 
> The expected implementation, if no hardware support exists, is to build a 
> list of relocations to apply to either the fragment or the vertex shader, and 
> patch one of them at validation time to match the other.
> Data structures are provided in gallium/auxiliary to ease this, and try to 
> minimize the number of times where this needs to be performed.
> 
> Let's now proceed to the discussion and detailed rationale, mostly 
> constructed by copy&pasting older messages.
> 
> ===============
> Michal Krol's proposal
> ===============
> 
> First of all, see Michal Krol's proposal at 
> http://www.opensource-archive.org/showthread.php?t=148573, and in particular:
> <<
> name index range
> ----------------------------
> POSITION no limit?
> COLOR 0..1, explicit clamp?
> BCOLOR 0..1, explicit clamp?
> FOG remove?
> PSIZE 0
> GENERIC 0..<max generics>
> NORMAL remove
> FACE 0
> EDGEFLAG 0
> PRIMID 0
> INSTANCEID 0
>>>
> 
> My proposal follows this, except for limiting POSITION to 0 too.
> Not sure why Michal thought "no limit" could make sense: the POSITION is 
> fundamentally a singleton, since it is the input to the rasterizer unit.
> 
> 
> ======================
> An overview of hardware support
> ======================
> 
> Hardware with no capabilities.
> - nv30 does not support any mapping. However, we already need to patch
> fragment programs to insert constants, so we can patch input register
> numbers as well. The current driver only supports 0-7 generic indices,
> but I already implemented support for 0-255 indices with in-driver
> linkage and patching. Note that nv30 lacks control flow in fragment
> programs.
> - nv40 is like nv30, but supports fp control flow, and may have some
> configurable mapping support, with unknown behavior
> 
> Hardware with capabilities that must be configured for each fp/vp pair.
> - nv40 might have this but the nVidia OpenGL driver does not use them
> - nv50 has configurable vp->gp and gp->fp mappings with 64 entries.
> The current Gallium driver seems to support arbitrary 0-2^32 indices, but 
> uses an inefficient O(n^2) algorithm to be able to do that
> 
> - r300 appears to have a configurable vp->fp mapping. The current
> driver only supports 0-15 generic indices, but redefining
> ATTR_GENERIC_COUNT could be enough to have it support larger numbers.
> 
> Hardware with automatic linkage when semantics match:
> - VMWare svga appears to support 14 * 16 semantics, but the current
> driver only supports 0-15 generic indices. This could be fixed by
> mapping GENERIC into all non-special SM3 semantics.
> 
> Hardware that can do both configurable mappings and automatic linkage:
> - r600 supports linkage in hardware between matching apparently
> byte-sized semantic ids
> 
> Other hardware;
> - i915 has no hardware vertex shading
> The current driver is broken and only supports 0-7 indices: this seems
> easy to fix though
> - Not sure about i965
> 
> ===================
> An overview of software APIs
> ===================
> 
> 1. DirectX 9 SM3 supports indices in the 0-15 range associated with 
> semantics in the 0-13 range.
> 
> A few of the name/index pairs have special meanings, but the others
> are just cosmetic as long as the fixed pipeline is not used.
> 
> Thus, SM3 wants to use 14 * 16 indices overall.
> 
> Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC
> semantics, leaving 219 semantics handled by GENERIC
> 
> 2. SM2 and non-GLSL OpenGL just want to use as many indices as the
> hardware interpolator count, sometimes limiting that further
> 
> They are the most easy and straightforward ones.
> 
> 3. DirectX 10 seems to only require a 0-31 range.
> 
> In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and
> 32-bit indices.
> 
> However, this information is encoded as metadata in the output file, and
> the shader bytecode itself uses integers in the 0-31 range to refer to the
> metadata.
> 
> It seems that the metadata is resolved by the Microsoft DirectX 10 runtime,
> and the driver only sees 0-31 indices on the DDI interface.
> 
> However, this is a bit unclear: confirmation or correction would be
> appreciated.
> 
> 4. GLSL requires to provide both shaders at link time, and thus does
> not constrain the implementation in any way.
> 
> However, it may be possible to mix GLSL with other shaders, leading to
> the need to reserve the texcoord slots.
> 
> In that case, GLSL will need about 8 more slots that the number of
> effectively used semantics.
> 
> This is the case with the current Mesa/Gallium implementation
> 
> 5. GLSL with EXT_separate_shader_objects does not add requirements
> because only gl_TexCoord and other builtin varyings are supported.
> User-defined varyings are not supported
> 
> See in particular the following text from the extension:
> <<
>         It is undesirable from a performance standpoint to attempt to
>         support "rendezvous by name" for arbitrary separate shaders
>         because the separate shaders won't be naturally compiled to
>         match their varying inputs and outputs of the same name without
>         a special link step.  Such a special link would introduce an
>         extra validation overhead to binding separate shaders.  The link
>         itself would have to be deferred until glBegin time since separate
>         shaders won't match when transitioning from one set of consistent
>         shaders to another.  This special link would still create errors
>         or undefined behavior when the names of input and output varyings
>         matched but their types did not match.
>>>
> 
> 6. An hypotetical version of EXT_separate_shader_objects extended to
> support user-defining varyings would either want arbitrary 32-bit
> generic indices (by interning strings to generate the indices) or the
> ability to specify a custom mapping between shader indices
> 
> 7. An hypotetical "no-op" implementation of the GLSL linker would have
> the same requirement
> 
> 
> ====================
> About non-GENERIC semantics
> ====================
> 
> Also note that non-GENERIC semantics have peculiar properties.
> 
> For COLOR and BCOLOR:
> 1. SM3 and OpenGL with glColorClamp appropriately set wants it to
> _not_ be clamped to [0, 1]
> 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1]
> (sometimes for fixed point targets only) and may also allow using
> U8_UNORM precision for it instead of FP32
> 3. OpenGL allows to enable two-sided lighting, in which case COLOR in
> the fragment shader is automagically set to BCOLOR for back faces
> 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING.
> Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware.
> The latest hardware probably supports FACING only.
> 
> Any API that requires special semantics for COLOR and BCOLOR (i.e.
> non-SM3) seems to only want 0-1 indices.
> 
> Note that SM3 does *not* include BCOLOR, so basically the limits for
> generic indices would need to be conditional on BCOLOR being present
> or not (e.g. if it is present, we must reserve two semantic slots in
> svga for it).
> 
> POSITION0 is obviously special.
> PSIZE0 is also special for points.
> 
> FOG0 seems right now to just be a GENERIC with a single component.
> Gallium could be extended to support fixed function fog, which most
> DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal
> to the semantic issue.
> 
> ==============
> Current Gallium users
> ==============
> 
> Right now no open-source users of Gallium fundamentally require arbitrary 
> indices.
> In particular:
> 1. GLSL and anything with similar link-by-name can of course be modified to 
> use sequential indices
> 2. ARB fragment program and vertex program use index-limited texcoord slots
> 3. g3dvl needs and uses 8 texcoord slots, indices 0-7
> 4. vega and xorg use indices 0-1
> 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
> 6. The GL_EXT_separate_shader_objects extension does not provide
> arbitrary index matching for GLSL, but merely lets it use a model
> similar to ARB fp/vp
> 
> However, the GLSL linker needs them in its current form, and the capability 
> can be generally useful anyway.
> 
> ===================
> Discussion of possible options
> ===================
> 
> [Options from Keith Whitwell, see 
> http://www.opensource-archive.org/showthread.php?p=180719]
> a) Picking a lower number like 128, that an SM3 state tracker could
> usually be able to directly translate incoming semantics into, but which
> would force it to renumber under rare circumstances. This would make
> life easier for the open drivers at the expense of the closed code.
> 
> b) Picking 256 to make life easier for some closed-source SM3 state
> tracker, but harder for open drivers.
> 
> c) Picking 219 (or some other magic number) that happens to work with
> the current set of constraints, but makes gallium fragile in the face of
> new constraints.
> 
> d) Abandoning the current gallium linkage rules and coming up with
> something new, for instance forcing the state trackers to renumber
> always and making life trivial for the drivers...
> 
> [Options from me]
> 
> (e) Allow arbitrary 32-bit indices. This requires slightly more
> complicated data structures in some cases, and will require svga and
> r600 to fallback to software linkage if numbers are too high.
> 
> (f) Limit semantic indices to hardware interpolators _and_ introduce
> an interface to let the user specify an
> 
> Personally I think the simplest idea for now could be to have all
> drivers support 256 indices or, in the case of r600 and svga, the
> maximum value supported by the hardware, and expose that as a cap (as
> well as another cap for the number of different semantic values
> supported at once).
> The minimum guaranteed value is set to the lowest hardware constraint,
> which would be svga with 219 indices (assuming no bcolor is used).
> If some new constraints pop up, we just lower it and change SM3 state
> trackers to check for it and fallback otherwise.
> 
> This should just require simple fixes to svga and r300, and
> significant code for nv30/nv40, which is however already implemented.
> 
> Luca Barbieri (5):
>   tgsi: formalize limits on semantic indices
>   tgsi: add support for packing semantics in SM3 byte values
>   gallium/auxiliary: add semantic linkage utility code
>   nvfx: support proper shader linkage - adds glsl support
>   nvfx: expose GLSL
> 
> Michal Krol (1):
>   gallium: Remove TGSI_SEMANTIC_NORMAL.
> 
> 
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Mesa3d-dev mailing list
> Mesa3d-dev@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to