Alias-Improvements Branch
=========================
The primary objective of the branch is to transition away from the use
of the virtual operand use-def chains as data-flow representation for
memory optimizations. Memory optimizers need to transition to the
use of the alias-oracle which is a query based system. For the
forseeable future they can still rely on the safety-net the virtual
operand use-def chain provides.
I propose to merge the branch at the beginning of stage1 for GCC 4.5.
Bugfixes put on the branch can be merged independently. Likewise it
should be possible to merge the points-to changes and some of the
call-clobber changes before the rest.
Other changes though are difficult or impossible to separate without
intermediate regressions.
Overview of infrastructure changes done on the branch:
- There is a single memory symbol (.MEM) per function used to build the
virtual operand use-def chain. In trunk terms you can view this as
equivalent to having a single memory partition containing all symbols.
Due to the lack of precision computing and keeping this virtual
operand use-def chain is trivial, all code supporting more complex
operations has been removed. In particular it can be assumed that
at most a single VUSE and VDEF is available per statement. Most code
has been simplified according to that assumption.
- This single-symbol virtual operand use-def chain safety-net is also
available during early optimizations (in fact, always if the IL is
in SSA form). This removes the need for special-casing memory
statements during that compilation phase and enables to use common
infrastructure during early (and IPA) optimization phases. It also
allows to schedule regular memory optimization passes during early
optimization.
- The virtual operand use-def chain is kept up-to-date by the
operand-scanner on the branch. There is no need to manually mark
any symbols for renaming (instead this is now useless and costly,
code auditing is still necessary here). As there cannot be magically
more, less or different memory symbols required (there's only one)
optimization passes usually know if they are removing a load or a
store and so can act properly (or rely on the operand scanner which
updates virtual SSA form manually in the simple cases).
- The at most two virtual operands (a VUSE and a VDEF) are now real
operands (there is a vuse and a vdef member in the gimple structure
for memory statements). This means they can share the infrastructure
present for regular operands and immediate uses. All infrastructure
dealing with the old virtual operands has been removed. The operand
structures are linked in the normal use and defs lists, the special
lists for virtual uses and defs has been removed.
This means that operand iterators have been simplified a lot. The
possibility to iterate over only virtual operands has been removed
(there is only at most a single one, directly accessible via
gimple_v{use,def}_op). Most bulk changes in optimizers had to be
done because of this simplification.
- Global data that was not kept in sync and is either unused on the
branch or very hard to keep up-to-date has been removed. This includes
the bitmap of addressable variables and the escape type per variable
and pointer.
- Points-to information has been abstracted and is directly used for
pointer information as well as call-clobber related information. All
existing call-clobber related code has been removed.
- Numerous fixes to the points-to analysis code have been done.
- The alias-oracle present on the trunk has been extended to also
use points-to alias information.
- The FRE and PRE passes using the common SCCVN infrastructure have been
improved by improving the memory statement walking infrastructure of
the alias-oracle and its precision.
- The tree DSE pass has been improved to compensate the loss of DSE
done by the tree DCE pass on the trunk (but see below).
The alias-oracle and its interface is contained in the tree-ssa-alias.[ch]
files. Common with the trunk is the refs_may_alias_p type of query
which has been accompanied by statement based clobber / use queries.
Preliminary performance measurements show comparable performance for
SPEC 2000, Polyhedron and our usual set of C++ benchmarks.
SPEC 2006 measurements are on the way, all on x86_64.
Daily testing can be monitored beyond http://gcc.opensuse.org/,
look for "alias-improvements" branch annotation and compare with the
other runs on the frescobaldi machine. Performance comparisons on other
architectures are welcome.
In general the trade-off with the new scheme is that walking statements
via the virtual use-def links will be slower because possibly (a lot)
more statements are visited. This is offsetted by leaner operand
iterators, cheaper operand scanning and updating and less memory use
for maintaining the virtual operands. And most important, a lot less
code to maintain.
Current shortcomings ("regressions") present on the branch, which I do
not plan to work on before the merge.
- Uninitialized warnings for memory which depend on the virtual operands
will no longer work (but in very small testcases).
FAIL: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12)
FAIL: gcc.dg/uninit-pr19430.c (test for warnings, line 32)
FAIL: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41)
- A trick on the 4.4 branch shadowed the issue with the C/C++ frontend
using alias set zero for char and how that relates to char members
in structures (their alias sets have a zero member).
FAIL: gcc.dg/tree-ssa/pr27799.c (test for excess errors)
- DCE no longer removes as many dead stores as on trunk.
FAIL: gcc.dg/tree-ssa/loop-36.c scan-tree-dump-not dce2 "c.array"
FAIL: gcc.dg/tree-ssa/ldist-11.c scan-tree-dump-times ldist "distributed:
split to 2 loops" 1
There are also some XPASSes. Likewise gazillions of copile-time and
memory-hog related bugs should be fixed (and an unknown amount of new
ones introduced). A bunch of optimization related PRs are fixed as well.
There are a number of tasks left to do, either on the branch or (prefered)
on trunk after merging the branch.
- The pass-pipeline needs re-tuning. We re-compute points-to information
too often and at non-optimal places (is no longer necessary for correctness
to "re-compute" alias information).
- Special "early" passes can get removed, merged or be made use of
generally in the optimization pipeline.
- The DCE / DSE situation needs to be resolved.
On the trunk DCE does remove far more dead stores than DSE does. On
the branch dead store removal from DCE is severely pessimized (because
it still relies on the virtual operands). DSE is improved on the branch,
but it runs too late and not often enough and is also very simple.
The best solution is to remove the DSE pass and do all (and proper)
dead store removal from within DCE, but this is quite an effort and
as can be seen from benchmark numbers not a blocker for merging the
branch.
Comments / Performance measurements / code-reviews welcome.
Thanks,
Richard.