Thank you for the reply - it's pretty much what I thought, though it doesn't
really address my criticism. After having some other discussions on this as
well, maybe I should clarify the issues. Here's the summary for lazy people:
1. The TGSI instruction set is fine
2. TGSI is a reasonable communication format
3. TGSI is over-engineered impractical for program transformations
4. TGSI closes the mind
5. My recommendation for internal representation in pipe drivers
6. I am not (!) calling for any changes
First, I am clearly not criticising the instruction set of TGSI. The
instruction set is perfectly fine, in tune with hardware, existing APIs, and
so on. My criticism is in how programs are stored in memory.
Also, it was stated that any criticism I have should be resolved quickly while
the number of drivers, state trackers, etc. that would have to be changed is
still small.
In my opinion, this is not important. TGSI is actually a quite reasonable
format for *communicating* programs. It is purely descriptive in the extreme
and can therefore be very stable, and stability is exactly what is needed in
APIs, so TGSI is fine for APIs.
My criticism is in how well (or rather how badly) suited TGSI is for
*transforming* programs. The best piece of evidence for this is the large
number of utility functions that exist for TGSI.
It was pointed out (and I agree; I never doubted it in the first place) that a
lot of my criticism can be mitigated using more utility functions and external
scaffolding. Now compare the amount of utilities that already exist for Mesa's
gl_program to the number of utilities available for TGSI. Once you ignore the
utilities that exist in Mesa to deal with the vastness of OpenGL, it does seem
to me like TGSI has a lot more utilities.
The point is that gl_program is easy to work with and therefore does not
require those utilities in the first place! The things that should be easy (in
particular: modifying a single instruction; inserting and removing
instructions) are quite easy, using basic operations of array manipulation
that any C programmer understands naturally. TGSI makes those same things
significantly harder without offering anything in return.
(There are more places where things in TGSI are harder than they need to be; I
mentioned the Extended Swizzle madness in my original mail.)
That is why TGSI is impractical for program transformation.
The argument of utility function culminates in tgsi_parse and tgsi_build.
That's a lot of beautifully engineered code, but that code doesn't do
*anything*. That's pretty much the definition of over-engineered.
I also claim that TGSI narrows the mind. Admittedly, this is the part where
I'm leaning out of the window the most. It narrows the mind in two ways:
Because it is needlessly complicated it emphasizes looking at the small
picture over the big picture. The second way is what I call its stream-based
nature. The only easy and natural transformation of TGSI is to scan the
original token stream forward and to create a modified copy in the process.
(I'm not saying other transformations are impossible, I'm saying that other
transformations are not represented naturally for TGSI.) When trying to solve
a given problem, this discourages experimentation with other kinds of
transformations that might be more natural for the given problem (such is the
case in radeon_nqssadce where we travel backwards along the instruction list
while determining unused vector components, eliminating dead code, and
introducing MOVs to emulate non-native swizzles in a single pass).
That was my criticism. Now for my conclusions. In my experience, an array or
doubly linked list of instructions is the most useful representation
(declarations are stored somewhere else in the meantime; most program
transformations I've worked with care rather little about declarations).
The Radeon driver uses a doubly linked list, which is very cool for the kind
of transformations we need. There are some arguments in favour of using arrays
(e.g. some register allocation algorithms), so I'm not saying linked lists are
the only way to go. Also, I haven't investigated control flow in any serious
way yet, and this might also serve to change my opinion on the matter,
possibly to something entirely different.
Now, if I were to start writing the program compiler today, I might actually
use TGSI blindly. But there is no way I'm going to convert an existing
compiler that already uses an easier-than-TGSI-to-work-with representation.
And I encourage everybody who writes a program compiler from scratch to stay
away from TGSI as an internal representation (unless you really don't need to
do intermediate transformations).
Going forward, I would appreciate it if other drivers that need the kind of
transformations we're doing were to adopt the same basic structures. Then we
could standardize on a common representation and share code. I suppose this is
most interesting for hardware that quite literally implements older DX shader
models (e.g. hardware that has limited support for swizzling). I don't know if
this relevant for Nouveau - for Intel it probably isn't.
If the hardware is so different that there is no opportunity for code sharing,
then no harm done.
cu,
Nicolai
Am Saturday 25 July 2009 18:55:09 schrieb Zack Rusin:
> On Saturday 25 July 2009 05:52:01 Nicolai Hähnle wrote:
> > Am Saturday 25 July 2009 05:15:13 schrieben Sie:
> > > I'm not sure I understand what you mean by "stream-based nature of
> > > TGSI". It's just just an array of struct tgsi_token's. So technically
> > > you could iterate from the back, middle, 2+middle or anything else. Or
> > > is the problem the fact that you don't know the explicit number of
> > > those tokens? That's what struct tgsi_shader_info is for (besides the
> > > number of tokens, it gives you a lot of other very useful things, which
> > > could be of course extended if you needed more).
> > >
> > > Have you looked at tgsi_transform_shader ? If it could simply iterate
> > > from the back would that, along tgsi_shader_info solve your problem?
> >
> > I have looked at tgsi_transform_shader, and if it could iterate from the
> > back it would already improve things.
> >
> > I wonder how you would go about iterating from the back without doing a
> > forward pass first to figure out where the boundaries of instructions
> > are. As I see it, TGSI has a similar problem to e.g. x86 opcodes: If you
> > have a pointer to one instruction, you can more or less easily calculate
> > the offset to the *next* instruction, but going to the *previous*
> > instruction is mostly guesswork because instructions are of variable
> > length.
>
> Well, that's fairly trivial. There's a number of things that we could do. A
> simple array of markers(essentially just saying which tgsi_token in the
> array is a start of a new instruction) in tgsi_shader_info would probably
> already fix it, or just changing the padding in tgsi_token to signify that
> it is a marker. I'm not sure why you're so afraid to iterate over the
> shader forward in the first place. You'll have to do it at some point
> anyway (from what I understand in your case that would be to figure out for
> example whether to shader uses texture sampling to make a decision whether
> it needs to be transformed in the first place).
> tgsi_shader_info is supposed to fill that need, so as I mentioned if you
> need more things in it we could certainly add them.
> So there's really a number of things that we could do and the question just
> comes down to, what is most convenient.
>
> Also you could just itertate backwards like this:
>
> void iterate_backwards(struct tgsi_token *tokens)
> {
> unsigned markers[256];
> unsigned num_markers = 0, i, position;
> struct tgsi_header header = *(struct tgsi_header *) &tokens[1];
>
> position = 1 + header.HeaderSize;
>
> while (position >= (1 + header.HeaderSize + header.BodySize)) {
> struct tgsi_token *itr = &tokens[position];
> markers[num_markers] = position;
> ++num_markers;
> position += itr->NrTokens;
> }
>
> for(i = num_markers - 1; i >= 0; --i) {
> struct tgsi_token *itr = &tokens[markers[i]];
> switch (itr->Type) {
> case TGSI_TOKEN_TYPE_DECLARATION:
> // do something with the declaration
> case TGSI_TOKEN_TYPE_IMMEDIATE:
> // do something with immediate
> case TGSI_TOKEN_INSTRUCTION:
> // do something with instruction
> }
> }
> }
>
> obviously untested, but it should give you a picture, of how to do it right
> now without any changes anywhere.
>
> > Of course I'm not familiar with other hardware. Maybe there is simply
> > nothing worth sharing. However, if there is something worth sharing - a
> > kind of toolbox of program transformations that drivers can pick from as
> > needed for their specific hardware - then this clearly needs a shared
> > intermediate representation.
>
> Yes, as I mentioned, if something is common then it can should be doable
> with tgsi_transform. It's really just a question of whether iteration
> backwards is so common to make the few lines that it takes to do it right
> now even easier and export that ability to the main interface (e.g. add the
> info that we compute in beginning to tgsi_shader_info or maybe create
> tgsi_backwards_iterate).
>
> z
------------------------------------------------------------------------------
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev