-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nicolai Hähnle wrote: > Am Tuesday 13 October 2009 21:20:40 schrieb Ian Romanick: >> Here is the initial version of the assembly extension that was discussed >> at XDC. This is a very early alpha version, and some parts are not yet >> complete. At this point, I am mainly looking for two things in a review: > > Looks good from a very cursory look. > >> - Are there any issues marked "RESOLVED" where you disagree with the >> resolution? I'm especially interested in issues 2, 4, and 19. > > Note: The following replies are based on my understanding of the hardware. > There may still be some missing or unclear information in the docs by AMD. If > this is the case, then it can hopefully be clarified in the course of this > thread. > > Issue 2: > 1) R500 supports unstructured branching in fragment programs but not in > vertex > programs, so I'm happy about leaving it out.
Weird. That's backwards from how other SM3 GPUs do it. Usually you get unstructured branching in the AoS vertex shader. > 2) R500 supports address registers as described in vertex programs (including > input/output offsets), but has no address registers at all in fragment > programs. A loop address register can be used as offsets in loops, but the > values loaded into this register must be determined at compile time. I had intended to move the grammar for ARL and ARR out of the generic GPU grammar and into the vertex program-specific grammar. The intention is that LOOP/ENDLOOP is the only way to load an address register in a fragment program. LOOP/ENDLOOP set the .x component and leave the other components undefined. Since the ENDLOOP restores the "previous" value of the address register, the last ENDLOOP restores garbage. My intention was to provide consistent syntactic sugar over the constrained functionality of the loop index. > Issue 4: Agreed. R500 does not support address register math. I looked at the documentation, and I didn't see a way to do it. > Issue 6 (predicate registers): > Is it correct that there is only a PSEQ instruction and not the full > orthogonal set? The grammar includes the full orthogonal set, but the > instruction list seems to be missing something. The full complement is supposed to be there. I created the entry fro PSEQ, got distracted, and never came back to it. > I assume predicate registers can be used to mask writes of ordinary ALU > instructions. Can they also mask TEX instructions? (R500 supports both, and > it's easy to emulate, but see caveat). Yes. Predicates can apply to anything. > I think we can do everything you throw at us on R500. The only difficulty is > that R500 is a bit schizophrenic in that vertex programs are very different > from fragment programs, but we can emulate things. The only stupid weakness > is > that swizzling predicates in fragment programs is essentially impossible (the > only natively supported swizzles are .rgba and the smears .rrrr, .gggg, > .bbbb, > .aaaa). Obviously we can emulate this. How painful would it be to emulate? We could restrict the set of available predicate swizzles. I think this matches D3D, so it shouldn't be a problem for Wine. > Issue 11: > R500 supposedly supports relative addressing of temporary registers in vertex > programs, and also in fragment programs (but only using loop indices). I have > never tested whether it actually works, though. This would be a good feature to have. Would it be possible to hack up a test? Do you know of any limitations? > Issue 13: > Similar to issue 2, R500 fragment programs support unstructured everything > but > vertex programs don't, so not overlapping sounds good to me. > > Issue 15: > I know R500 fragment programs can support a CONT, but I'm not so familiar > with > the R500 vertex programs, and they seem generally less flexible. I didn't see an explicit CONT instruction. If there's no unstructured branch, there probably isn't a way to do it. > Issue 17: > I would *expect* negative addressing offsets to work on R500, but somehow I > haven't been able to get them to work. I'll see if I can look into it again. No hardware that I'm aware of supports true negative offsets in the instructions. This is made to work with program parameters by putting the base of the array at a large enough positive offset to make the largest negative offset be zero. For example, if the program uses my_array[A0.x - 10], the driver has to place my_array at parameter slot 10 or higher. I don't think we can do similar trickery for attributes and results. I think we may have to leave the negative offsets just for program parameters and only allow positive offsets for attributes and results. Note that NV_gpu_program4 only allows positive offsets. It can get away with this because SM4 has general purpose integer instructions and any register can be used for indirect addressing. > Issue 34: > I don't see any support for an address register stack on R500, or anything > else to provide for a subroutine stack. If you can do relative addressing of temporaries, you can fake a small stack. It's ugly, but it's possible. Of course, without address register math it's even more ugly. I'll post an updated version in the morning with the grammar change (for ARL and ARR) and the documentation for the other predicate-set instructions. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrVbbUACgkQX1gOwKyEAw9boQCeOP0HMtIWb3vOoKeSy4b5seMD tMAAnROKJ61S7EBO6epL9CtYqx4B1xH1 =NhpQ -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Mesa3d-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
