This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. It can also be found in the RFC-gallium-semantics branch.
It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. In particular, the following issues are still open: 1. COLOR clamping (and floating point framebuffers) 2. A linkage table CSO allowing to specify non-identity linkage 3. BCOLOR/FACE-related issues 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided This topic was already very extensively discussed. See http://firstname.lastname@example.org/msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) See http://email@example.com/msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. See in particular http://firstname.lastname@example.org/msg12041.html . That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. This message should probably be adapted into gallium/docs if/when this is accepted. Here is the short summary; the long rationale follows after it. The proposal here is to add the following limitations to Gallium, for the intermediate semantics: 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. =============== Michal Krol's proposal =============== First of all, see Michal Krol's proposal at http://www.opensource-archive.org/showthread.php?t=148573, and in particular: << name index range ---------------------------- POSITION no limit? COLOR 0..1, explicit clamp? BCOLOR 0..1, explicit clamp? FOG remove? PSIZE 0 GENERIC 0..<max generics> NORMAL remove FACE 0 EDGEFLAG 0 PRIMID 0 INSTANCEID 0 >> My proposal follows this, except for limiting POSITION to 0 too. Not sure why Michal thought "no limit" could make sense: the POSITION is fundamentally a singleton, since it is the input to the rasterizer unit. ====================== An overview of hardware support ====================== Hardware with no capabilities. - nv30 does not support any mapping. However, we already need to patch fragment programs to insert constants, so we can patch input register numbers as well. The current driver only supports 0-7 generic indices, but I already implemented support for 0-255 indices with in-driver linkage and patching. Note that nv30 lacks control flow in fragment programs. - nv40 is like nv30, but supports fp control flow, and may have some configurable mapping support, with unknown behavior Hardware with capabilities that must be configured for each fp/vp pair. - nv40 might have this but the nVidia OpenGL driver does not use them - nv50 has configurable vp->gp and gp->fp mappings with 64 entries. The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses an inefficient O(n^2) algorithm to be able to do that - r300 appears to have a configurable vp->fp mapping. The current driver only supports 0-15 generic indices, but redefining ATTR_GENERIC_COUNT could be enough to have it support larger numbers. Hardware with automatic linkage when semantics match: - VMWare svga appears to support 14 * 16 semantics, but the current driver only supports 0-15 generic indices. This could be fixed by mapping GENERIC into all non-special SM3 semantics. Hardware that can do both configurable mappings and automatic linkage: - r600 supports linkage in hardware between matching apparently byte-sized semantic ids Other hardware; - i915 has no hardware vertex shading The current driver is broken and only supports 0-7 indices: this seems easy to fix though - Not sure about i965 =================== An overview of software APIs =================== 1. DirectX 9 SM3 supports indices in the 0-15 range associated with semantics in the 0-13 range. A few of the name/index pairs have special meanings, but the others are just cosmetic as long as the fixed pipeline is not used. Thus, SM3 wants to use 14 * 16 indices overall. Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC semantics, leaving 219 semantics handled by GENERIC 2. SM2 and non-GLSL OpenGL just want to use as many indices as the hardware interpolator count, sometimes limiting that further They are the most easy and straightforward ones. 3. DirectX 10 seems to only require a 0-31 range. In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and 32-bit indices. However, this information is encoded as metadata in the output file, and the shader bytecode itself uses integers in the 0-31 range to refer to the metadata. It seems that the metadata is resolved by the Microsoft DirectX 10 runtime, and the driver only sees 0-31 indices on the DDI interface. However, this is a bit unclear: confirmation or correction would be appreciated. 4. GLSL requires to provide both shaders at link time, and thus does not constrain the implementation in any way. However, it may be possible to mix GLSL with other shaders, leading to the need to reserve the texcoord slots. In that case, GLSL will need about 8 more slots that the number of effectively used semantics. This is the case with the current Mesa/Gallium implementation 5. GLSL with EXT_separate_shader_objects does not add requirements because only gl_TexCoord and other builtin varyings are supported. User-defined varyings are not supported See in particular the following text from the extension: << It is undesirable from a performance standpoint to attempt to support "rendezvous by name" for arbitrary separate shaders because the separate shaders won't be naturally compiled to match their varying inputs and outputs of the same name without a special link step. Such a special link would introduce an extra validation overhead to binding separate shaders. The link itself would have to be deferred until glBegin time since separate shaders won't match when transitioning from one set of consistent shaders to another. This special link would still create errors or undefined behavior when the names of input and output varyings matched but their types did not match. >> 6. An hypotetical version of EXT_separate_shader_objects extended to support user-defining varyings would either want arbitrary 32-bit generic indices (by interning strings to generate the indices) or the ability to specify a custom mapping between shader indices 7. An hypotetical "no-op" implementation of the GLSL linker would have the same requirement ==================== About non-GENERIC semantics ==================== Also note that non-GENERIC semantics have peculiar properties. For COLOR and BCOLOR: 1. SM3 and OpenGL with glColorClamp appropriately set wants it to _not_ be clamped to [0, 1] 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] (sometimes for fixed point targets only) and may also allow using U8_UNORM precision for it instead of FP32 3. OpenGL allows to enable two-sided lighting, in which case COLOR in the fragment shader is automagically set to BCOLOR for back faces 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. The latest hardware probably supports FACING only. Any API that requires special semantics for COLOR and BCOLOR (i.e. non-SM3) seems to only want 0-1 indices. Note that SM3 does *not* include BCOLOR, so basically the limits for generic indices would need to be conditional on BCOLOR being present or not (e.g. if it is present, we must reserve two semantic slots in svga for it). POSITION0 is obviously special. PSIZE0 is also special for points. FOG0 seems right now to just be a GENERIC with a single component. Gallium could be extended to support fixed function fog, which most DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal to the semantic issue. ============== Current Gallium users ============== Right now no open-source users of Gallium fundamentally require arbitrary indices. In particular: 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices 2. ARB fragment program and vertex program use index-limited texcoord slots 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 4. vega and xorg use indices 0-1 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken 6. The GL_EXT_separate_shader_objects extension does not provide arbitrary index matching for GLSL, but merely lets it use a model similar to ARB fp/vp However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway. =================== Discussion of possible options =================== [Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719] a) Picking a lower number like 128, that an SM3 state tracker could usually be able to directly translate incoming semantics into, but which would force it to renumber under rare circumstances. This would make life easier for the open drivers at the expense of the closed code. b) Picking 256 to make life easier for some closed-source SM3 state tracker, but harder for open drivers. c) Picking 219 (or some other magic number) that happens to work with the current set of constraints, but makes gallium fragile in the face of new constraints. d) Abandoning the current gallium linkage rules and coming up with something new, for instance forcing the state trackers to renumber always and making life trivial for the drivers... [Options from me] (e) Allow arbitrary 32-bit indices. This requires slightly more complicated data structures in some cases, and will require svga and r600 to fallback to software linkage if numbers are too high. (f) Limit semantic indices to hardware interpolators _and_ introduce an interface to let the user specify an Personally I think the simplest idea for now could be to have all drivers support 256 indices or, in the case of r600 and svga, the maximum value supported by the hardware, and expose that as a cap (as well as another cap for the number of different semantic values supported at once). The minimum guaranteed value is set to the lowest hardware constraint, which would be svga with 219 indices (assuming no bcolor is used). If some new constraints pop up, we just lower it and change SM3 state trackers to check for it and fallback otherwise. This should just require simple fixes to svga and r300, and significant code for nv30/nv40, which is however already implemented. Luca Barbieri (5): tgsi: formalize limits on semantic indices tgsi: add support for packing semantics in SM3 byte values gallium/auxiliary: add semantic linkage utility code nvfx: support proper shader linkage - adds glsl support nvfx: expose GLSL Michal Krol (1): gallium: Remove TGSI_SEMANTIC_NORMAL. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mesa3demail@example.com https://lists.sourceforge.net/lists/listinfo/mesa3d-dev