Am 20.04.2013 13:12, schrieb Vadim Girlin:
On 04/20/2013 01:42 PM, Christian König wrote:
Am 19.04.2013 18:50, schrieb Vadim Girlin:
On 04/19/2013 08:35 PM, Christian König wrote:
Hey Vadim,

Am 19.04.2013 18:18, schrieb Vadim Girlin:
[SNIP]

In theory, yes, some optimizations in this branch are typically used
on the earlier compilation stages, not on the target machine code. On
the other hand, there are some differences that might make it harder,
e.g. many algorithms require SSA form, and though it's possible to do
similar optimizations without SSA, it would be hard to implement. Also
I wanted to support both default backend and llvm backend for
increased testing coverage and to be able to compare the efficiency of
the algorithms in my experiments etc.

Yeah I know, missing an SSA implementation is also something that always
bothered me a bit with both TGSI and GLSL (while I haven't done much
with GLSL, so maybe I misjudge here).

Can you name the different algorithms used?

There is a short description of the algorithms and passes in the
notes.markdown file [1] in that branch, there are also links in the
end to the full description of some algorithms, though some of them
were modified/adapted for this branch.

It's not a strict prerequisite, but I think we both agree that doing
things like LICM on R600 bytecode isn't the best idea over all (when
doing it on GLSL would be beneficial for all drivers not only r600).

In fact there is no special LICM pass, it's done by the GCM (Global
Code Motion, [2]), which probably could be also called global
scheduler. In fact in my branch this pass is combined with some
hw-specific scheduling logic, e.g. grouping fetch/alu instructions to
reduce clause type switching in the code and the number of required CF
instructions, potentially it can also schedule clauses to expose more
parallelism with the BARRIER bit usage.


Yeah I already thought that you're using something like this.

On one hand that is really good, cause it is specialized on so produces
really optimal code for the r600 target. But on the other hand it's bad,
cause it is specialized on so produces really optimal code ONLY on the
r600 target....

I think such pass on higher level (GLSL IR or TGSI) would at least need some callbacks or caps to be tunable for the target.

Anyway the result of GCM pass is affected by the CFG structure, so when the target applies e.g. if-conversion or any other target-specific control flow optimization, this means that you might want to apply similar pass again on the target instruction level for better results, and then previous pass on higher level IR looks not very useful.

Also there are some high level operations that are translated to the bunch of target instructions, e.g. integer division on r600. High-level pass can't hoist "i/5" (where i is loop counter) out of the loop, but after translation to target instructions it's possible to hoist some of the resulting instructions, producing more efficient code.

One more point is that GCM allows to achieve best efficiency when used with GVN (Global Value Numbering) pass, e.g. GCM allows GVN to not care about code placement during elimination of redundant operations, so you'll probably want to implement high-level GVN pass as well.

I think it's possible to implement GVN-GCM on GLSL or TGSI level, but I suspect it will require a lot more efforts than it was required by implementation of these passes in my branch, and will be less efficient.


Just speculating, what would it take to make those passes run on the
LLVM Machine Instruction representation instead of your own representation?

Main difference between IRs is the representation of control flow, r600-sb relies on the fact that r600 arch doesn't have arbitrary control flow, this renders CFGs superfluous. Implementation of these passes on CFGs will be more complicated, it will also require the computation of dominance frontiers, loops detection and analysis, etc. On the r600-sb's IR these passes are greatly simplified.

Regarding the GCM, original algorithm as described in that pdf works on the CFG, so it shouldn't be hard to implement in LLVM, but I'm not sure how it will fit into the LLVM infrastructure. LLVM has GVN-PRE, LICM and other passes that together do basically the same thing as GVN-GCM, so if you implement it, you might want to get rid of LLVM's own passes that duplicate the same functionality, and I'm not sure if this would be easy, possibly there are some interdependencies etc. Also I saw mentions of some plans (e.g. [1],[2]) regarding the implementation of global code motion in LLVM, looks like there is already some work in progress.


Oh, I wasn't taking about replacing any LLVM passes, more like extending them to provide the same amount of functionality. Also I hadn't had LLVM IR in mind while writing this, but more the machine instruction representation they use.

Well you have quite allot of C++ code here, and a big bunch of it just deals with bringing the bytecode into a representation where you can run your algorithms on it.

Christian.

Vadim

[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120709/146206.html [2] http://markmail.org/message/2td3fnnggk6oripp#query:+page:1+mid:2td3fnnggk6oripp+state:results


Christian.

Vadim

 [1]
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb/notes.markdown?h=r600-sb

 [2]
http://www.cs.washington.edu/education/courses/cse501/06wi/reading/click-pldi95.pdf


Regards,
Christian.






_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to