And as I've just started looking at GM107 traces to fix up tessellation shader attribute address calculations, I noticed the following unknown bits in CommonWord3 of TCP shaders:
PB: 0x00000021 GM107_3D.SP[0x2].SELECT = { ENABLE | PROGRAM = TCP } PB: 0x00000830 GM107_3D.SP[0x2].START_ID = 0x830 HEADER: 0x04210861 0 = { SPH = VTG | VERSION = 3 | KIND = TCP | GMEM_STORE | SASS_VERS 0x06000000 1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 6 } 0x03000000 2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 3 } 0x60000000 3 = { WARP_CSTACK_SIZE = 0 | 0x60000000 } 0xff000000 4 = { MIN_OUT_READ_SLOT = 0 | MAX_OUT_READ_SLOT = 0xff } 0xf0000000 ATTR_EN_0 = 0xf0000000 0x00000000 ATTR_EN_1 = 0 0x00000000 ATTR_EN_2 = 0 0x00000000 ATTR_EN_3 = 0 0x00000000 ATTR_EN_4 = 0 0x00000000 ATTR_EN_5 = { 0 } 0x00000000 11 = 0 0x00000000 12 = 0 0x0000f000 EXPORT_EN_0 = { HPOS = 0xf } 0x00000000 EXPORT_EN_1 = 0 0x00000000 EXPORT_EN_2 = 0 0x00000000 EXPORT_EN_3 = 0 0x00000000 EXPORT_EN_4 = 0 0x00000000 EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 } 0x00000000 19 = 0 Anything that we need to also be setting? -ilia On Mon, Jun 22, 2015 at 9:10 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > And an additional question: I have a trace here where a reserved bit > from CommonWord0 is set. Is that just random values that aren't > cleared by the driver, or does it have some significance? Here is the > full shader: > > HEADER: > 0x06040461 0 = { SPH = VTG | VERSION = 3 | KIND = VP_B | > SASS_VERSION = 2 | LDST_ENABLE | SO_MASK = 0 | 0x2000000 } > 0x00000000 1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 0 } > 0x00000000 2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 0 } > 0x00000000 3 = { WARP_CSTACK_SIZE = 0 | OUTPUT_PRIM = 0 } > 0x00000000 4 = { MAX_OUTPUT_VERTS = 0 | MIN_OUT_READ_SLOT = 0 | > MAX_OUT_READ_SLOT = 0 } > 0x00000000 ATTR_EN_0 = 0 > 0x00000000 ATTR_EN_1 = 0 > 0x00000000 ATTR_EN_2 = 0 > 0x00000000 ATTR_EN_3 = 0 > 0x00000000 ATTR_EN_4 = 0 > 0x00000000 ATTR_EN_5 = { 0 } > 0x00000000 11 = 0 > 0x00000000 12 = 0 > 0x0001f000 EXPORT_EN_0 = { HPOS = 0xf | 0x10000 } > 0x00000000 EXPORT_EN_1 = 0 > 0x00000000 EXPORT_EN_2 = 0 > 0x00000000 EXPORT_EN_3 = 0 > 0x00000000 EXPORT_EN_4 = 0 > 0x00000000 EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 } > 0x00000000 19 = 0 > CODE: > 00000000: a01088b0 08bcb810 sched 0x2c 0x22 0x4 0x28 0x4 0x2e 0x2f > 00000008: 0b1ffc1e 5b601c07 set $p0 0x1 ge u32 0x0 c0[0x3858] > 00000010: 1000003c 12000000 $p0 bra 0x38 > 00000018: 0a1c0002 64c03c07 mov b32 $r0 c0[0x3850] > 00000020: 0a9c0006 64c03c07 mov b32 $r1 c0[0x3854] > 00000028: 001c0000 cc800000 ld b32 $r0 cg g[$r0d] > 00000030: 041c003c 12000000 bra 0x40 > > 00000038: 7f9c0002 e4c03c00 C mov b32 $r0 0x0 > > 00000040: 9c108010 090c8c10 C sched 0x4 0x20 0x4 0x27 0x4 0x23 0x43 > 00000048: 001c2802 e5c00000 cvt rn f32 $r0 u32 $r0 > 00000050: 341c0006 64c03c00 mov b32 $r1 c0[0x1a0] > 00000058: 349c000a 64c03c00 mov b32 $r2 c0[0x1a4] > 00000060: 351c000e 64c03c00 mov b32 $r3 c0[0x1a8] > 00000068: 359c0012 64c03c00 mov b32 $r4 c0[0x1ac] > 00000070: 381ffc06 7f03fc00 st b32 a[0x70] $r1 0x0 0x0 > 00000078: 3a1ffc0a 7f03fc00 st b32 a[0x74] $r2 0x0 0x0 > 00000080: 3c110d0c 08000001 sched 0x43 0x43 0x4 0x4f 0x0 0x0 0x0 > 00000088: 3c1ffc0e 7f03fc00 st b32 a[0x78] $r3 0x0 0x0 > 00000090: 3e1ffc12 7f03fc00 st b32 a[0x7c] $r4 0x0 0x0 > 00000098: 401ffc02 7f03fc00 st b32 a[0x80] $r0 0x0 0x0 > 000000a0: 001c003c 18000000 exit > > 000000a8: fc1c003c 12007fff C bra 0xa8 > 000000b0: 001c3c02 85800000 nop > 000000b8: 001c3c02 85800000 nop > > On Sat, May 23, 2015 at 5:35 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >> On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >>> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmor...@nvidia.com> wrote: >>>> Hi Ilia, >>>> >>>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote: >>>>> Hi, >>>>> >>>>> As I'm looking to add some support to nouveau for features like atomic >>>>> counters and images, I'm running into some confusion about what the >>>>> first word of the shader header means. Here is the definition as we >>>>> have it today: >>>> >>>> [...] >>>> >>>>> However I know that these are somewhat wrong. I've seen shaders that >>>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable >>>>> bit set (and they use no lmem). And I've seen additional bits set, esp >>>>> relating to images, but I haven't spent enough time looking at all the >>>>> variations to make sense of it yet. For example, I think that Fermi >>>>> and Kepler+ have different meanings for some of the bits. >>>> >>>> Those look pretty close :) >>>> >>>>> I was hoping you could just release the docs for the shader headers, >>>>> or at least the first word of the shader header. >>>> >>>> We've posted the specification for the full Shader Program Header to our >>>> GPU documentation site here: >>>> >>>> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html >>>> >>>> I hope it helps clear things up. >>> >>> Yep, just a few follow-up questions: >>> >>> - SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When >>> PS is used, field SphType in CommonWord0 must be set to 1; similarly, >>> when VTG is used, SphType in CommonWord0 must be set to 2." But the >>> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2 >>> is clearly meant for PS... >>> - You skip over SassVersion -- what is that? >>> - You have a funny note in there -- "Triangles generated by the >>> geometry shader always have all their edge flags set to TRUE" -- that >>> is the *only* reference to edge flags in the whole document. Right now >>> we do some crazy thing to get edge flags right on fermi+ (and I think >>> we just get them wrong on tesla). Is there a way to emit edge flags >>> from vertex shader? >>> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD? >> >> Oh, and one more little correction: >> >> """ >> The SPH field OutputTopology sets the primitive topology of the >> vertices that are output from the pipe stage. This field is only used >> with geometry shaders, where the value must be greater than zero and >> has a maximum of 1024. The allowed values are: ... [the correct values >> for OutputTopology] >> """ >> >> The 1024 thing seems like it probably applies to MaxOutputVertexCount >> in CommonWord4. >> >> -ilia _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau