On Sat, Apr 03, 2010 at 08:37:39PM +0200, Luca Barbieri wrote:
> This is somewhat nice, but without using a real compiler, the result
> will still be just a toy, unless you employ hundreds of compiler
> experts working full time on the project.
>
<SNIP - loop optimization techniques from Wikipedia>
> 
> Good luck doing all this on TGSI (especially if the developer does not
> have serious experience writing production compilers).

I agree with you that doing these kinds of optimizations is a difficult
task, but I am trying to focus my proposal on emulating branches and
loops for older hardware that don't have branching instructions rather
than performing global optimizations on the TGSI code.  I don't think
most of the loop optimizations you listed are even possible on hardware
without branching instructions.

> Also, this does not mention all the other optimizations and analyses
> required to the above stuff well (likely other 10-20 things).
> 
> Using a real compiler (e.g. LLVM, but also gcc or Open64), those
> optimizations are already implemented, or at least there is already a
> team of experienced compiler developers who are working full time to
> implement such optimizations, allowing you to then just turn them on
> without having to do any of the work yourself.
> 
> Note all "X compiler is bad for VLIW or whatever GPU architecture"
> objections are irrelevant, since almost all optimizations are totally
> architecture independent.
> 
> Also note that we should support OpenCL/compute shaders (already
> available for *3* years on e.g. nv50) and those *really* need a real
> compiler (as in, something developed for years by a team of compiler
> experts, and in wide use).
> For instance, nVidia uses Open64 to compile CUDA programs, and then
> feeds back the output (via PTX) to their ad-hoc code generator.
> 
> Note that unlike Mesa/Gallium, nVidia actually had a working shader
> optimizer AND a large paid team, yet they still decided to at least
> partially use Open64.
> 
> PathScale (who seems to mainly sell an Open64-based compiler for the
> HPC market) might do some of this work (with a particular focus on a
> CUDA replacement for nv50), but it's unclear whether this will turn
> out to generally useful (for all Gallium drivers, as opposed to
> nv50-only) or not.
> Also they plan to use Open64 and WHIRL, and it's unclear whether this
> is as well designed for embedding and easy to understand and customize
> like LLVM is (please expand of this you know about it)
> 
> Really, the current code generation situation is totally _embarassing_
> (and r300 is probably one of the best here, having its own compiler,
> and doesn't even have loops, so you can imagine how good the other
> drivers are), and ought to be fixed in a definitive fashion.
> 
> This is obviously not achievable if Mesa/Gallium contributors are
> supposed to write the compiler optimization themselves, since clearly
> there is not even enough manpower to support a relatively up-to-date
> version of OpenGL or, say, to have drivers that can allocate and fence
> GPU memory in a sensible and fast way, or implement hierarchical Z
> buffers, or any of the other things expected from a decent driver,
> that the Mesa drivers don't do.
> 
> In other words, state-of-the-art optimizing compilers are not
> something one can just pop up and write himself from scratch, unless
> he is interested and skilled at it, it is his main project AND he
> manages to attract, or pays, a community of compiler experts to work
> on it.
> 
> Since LLVM already works well, has a community of compiler experts
> working on it, and is funded by companies such as Apple, there is no
> chance of attracting such a community, especially for something
> limited to the niche of compiling shaders.
> 
> And yes, LLVM->TGSI->LLVM is not entirely trivial, but it is doable
> (obviously), and once you get past that initial hurdle, you get
> EVERYTHING FOR FREE.
> And the free work keeps coming with every commit to the llvm
> repository, and you only have to do the minimal work of updating for
> LLVM interface changes.
> So you can just do nothing and after a few months you notice that your
> driver is faster on very advanced games because a new LLVM
> automatically improved the quality of your shaders without you even
> knowing about it.
> 
> Not to mention that we could then at some point just get rid of TGSI,
> use LLVM IR directly, and have each driver implement a normal backend
> if possible.
> 
> The test for adequateness of a shader compiler is saying "yes, this
> code is really good: I can't easily come up with any way to improve
> it", looking at the generated code for any example you can find.
> 
> Any ad-hoc compiler will most likely immediately fail such a test, for
> complex examples.

I think that part of the advantage of my proposal is that the branch
instruction translation is done on the TGSI code.  So, even if
the architecture of the GLSL compiler is changed to something like
LLVM->TGSI->LLVM, these translations can still be applied by hardware
that needs them.

> So, for a GSoC project, I'd kind of suggest:
> (1) Adapt the gallivm/llvmpipe TGSI->LLVM converter to also generate
> AoS code (i.e. RGBA vectors as opposed to RRRR, GGGG, etc.) if
> possible or write one from scratch otherwise
> (2) Write a LLVM->TGSI backend, restricted to programs without any control 
> flow
> (3) Make LLVM->TGSI always work (even with control flow and DDX/DDY)
> (4) Hook up all useful LLVM optimizations
> 
> If there is still time/as followup (note that these are mostly complex
> things, at most one/two might be doable in the timeframe)
> (5) Do something about uniform-specific shader generation, and support
> automatically generating "pre-shaders" for the CPU (using the
> x86/x86-64 LLVM backends) for uniform-only computations
> (6) Enhance LLVM to provide any missing optimization with a significant impact
> (7) Convert existing drivers to LLVM backends, or have them expose
> more functionality to the TGSI backend via TGSI extensions (or
> currently unused features such as predicate support), and do
> driver-specific stuff (e.g. scalarization for scalar architectures)
> (8) Make sure shaders can be compiled using as large as possible a
> subset of plain C/C++, as well as OpenCL (using clang), and add OpenCL
> support to Mesa/Gallium (some of it already exists in external
> repositories)
> (9) Compare with fglrx and nVidia libGL,/cgc/nvopencc and improve
> whatever necessary to be equal or better than them
> (10) Talk with LLVM developers about good VLIW code generation for the
> Radeons and to a lesser extent nv30/nv40 that need it, and find out
> exactly what the problem is here, how it can be solved and who could
> do the work
> (11) Add Gallium support for nv10/nv20 and r100/r200 using the LLVM
> DAG instruction selector to code-generate a fixed pipeline (Stephane
> Marchesin tried this already, seems it is non-trivial but could be
> made to work partially, and probably enough to get the Xorg state
> tracker to work on all cards and get rid of all X drivers at some
> point).
> (12) Figure out if any other compilers (Open64, gcc, whatever) can be
> useful as backends for some drivers

I think (2) is probably the closest to what I am proposing, and it is
something I can take a look at.

Thanks for your feedback.

-Tom Stellard

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to