Let's try to focus on what needs to be done looking for specific
features (or fixes) and how we could do it:
A) Printing the input expression instead of re-constructing it. As
Joseph explained, this will fix the problems that Aldy mentioned
(PR3544[123] and PR35742) and this requires:
1) For non-preprocessed expr we need at least two locations per expr
(beg/end). This will require changes on the build_* functions to
handle multiple locations.
1b) For each preprocessed token, we would need to keep two locations:
one for the preprocessed location and another for the original
location. As Joseph pointed out, ideally we should be able to
find a way to track this with a single location_t object so we do
not need 4 locations per expr.
2) Changes in the parser to pass down the correct locations to the
build_* functions.
3) A location(s) -> source strings interface and machinery. Ideally,
this should be more or less independent of CPP, so CPP (through
the diagnostics machinery) calls into this when needed and not
the other way around. This can be implemented in several ways:
a) Keeping the CPP buffers in memory and having in line-maps
pointers directly into the buffers contents. This is easy and
fast but potentially memory consuming. Care to handle
charsets, tabs, etc must be taken into account. Factoring out
anything useful from libcpp would help to implement this.
b) Re-open the file and fseek. This is not trivial since we need
to do it fast but still do all character conversions that we
did when libcpp opened it the first time. This is
approximately what Clang (LLVM) does and it seems they can do
it very fast by keeping a cache of buffers ever reopened. I
think that thanks to our line-maps implementation, we can do
the seeking quite more efficiently in terms of computation
time. However, opening files is quite embedded into CPP, so
that would need to be factored out so we can avoid any
unnecessary CPP stuff when reopening but still do it
*properly* and *efficiently*.
4) Changes in the diagnostics machinery to extract locations from
expr and print a string from a
source file instead of re-constructing things.
5) Handle locations during folding or avoid aggressive folding in
the front-ends.
6) Handle locations during optimisation or update middle-end
diagnostics to not rely in perfect location information. This
probably means not using %qE, not column info, and similar
limitations. Some trade-off must be investigated.
B) Printing accurate column information. This requires:
*) Preprocessed/original locations in a single location_t. Similar
as (A.1b) above.
*) Changes in the parser to pass down the correct
locations to diagnostics machinery. Similar to (A.2) above.
B.1) Changes in the testsuite to enable testing column numbers.
C) Consistent diagnostics. This requires:
C.1) Make CPP use the diagnostics machinery. This will fix part of
PR7263 and other similar bugs where there is a mismatch
between the diagnostics machinery and CPP's own diagnostics
machinery.
*) Preprocessed/original locations in a single location_t. This
will avoid different behaviour when a token comes from a macro
expansion. Similar as (A.1b) above.
D) Printing Ranges. This requires:
*) Printing accurate column information. Similar to (B) above.
*) A location(s) -> source strings interface and machinery. Similar
to (A.3) above.
*) Changes in the parser to pass down ranges. Similar to (A.2) above.
D.1) Changes in the testsuite to enable testing ranges.
D.2) Changes in the diagnostics machinery to handle ranges.
E) Caret diagnostics. This requires:
*) Printing accurate column information. Similar to (B) above.
*) A location(s) -> source strings interface and machinery. Similar
to (A.3) above.
E.1) Changes in the diagnostics machinery to print the source line
and a caret.
I have copied this in the wiki so anyone can update it or add
comments: http://gcc.gnu.org/wiki/Better_Diagnostics
I have some patches to make the diagnostic functions take explicit
locations and I hope to send them soon. Apart from those, I personally
don't have any specific plans to address any of the points above in
the near future because of lack of free time and I still have a long
queue of some trivial patches that I would like to get rid of before
we enter in regression-only mode.
Cheers,
Manuel.