Re: [PATCH 1/3] (v2) On-demand locations within string-literals

David Malcolm Fri, 29 Jul 2016 05:37:17 -0700

On Thu, 2016-07-28 at 15:16 -0600, Martin Sebor wrote:
> On 07/28/2016 02:38 PM, Martin Sebor wrote:
> > On 07/28/2016 02:12 PM, David Malcolm wrote:
> > > On Wed, 2016-07-27 at 23:41 +0100, Manuel López-Ibáñez wrote:
> > > > On 27 July 2016 at 15:30, David Malcolm <dmalc...@redhat.com>
> > > > wrote:
> > > > > > Perhaps it could live for now in c-format.c, since it is
> > > > > > the only
> > > > > > place using it?
> > > > > 
> > > > > Martin Sebor [CC-ed] wants to use it from the middle-end:
> > > > >    https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html
> > > > > so it's unclear to me that c-format.c would be a better
> > > > > location.
> > > > 
> > > > Fine. He will have to figure out how to get a cpp_reader from
> > > > the
> > > > middle-end, though.
> > > 
> > > It seems to me that on-demand reconstruction of source locations
> > > for
> > > STRING_CST nodes is inherently frontend-specific: unless we have
> > > the
> > > frontend record the information in some fe-independent way (which
> > > I
> > > assume we *don't* want to do, for space-efficiency), we need to
> > > be able
> > > to effectively re-run part of the frontend.
> > > 
> > > So maybe this needs to be a langhook; the c-family can use the
> > > global
> > > cpp_reader * there, and everything else can return a "not
> > > supported"
> > > code if a diagnostic requests substring location information (and
> > > the
> > > diagnostic needs to be able to cope with that).
> > 
> > The problem with the lanhook approach, as I learned from my first
> > -Wformat-length attempt, is that it doesn't make the front end
> > implementation available to LTO.  So passes that run late enough
> > with LTO (like the latest version of the -Wformat-length pass
> > does) would not be bale to make use of it.
> 
> I'm sorry, I didn't mean to sound like I was dismissing the idea.
> I agree that string processing is language and front-end specific.
> Having the middle end call back into the front-end also seems like
> the right thing to do, not just to make this case work, but others
> like it as well.  So perhaps the problem to solve is how to teach
> LTO to talk to the front end.  One way to do it would be to build
> the front ends as shared libraries.


Turning frontends into shared libraries as a prerequisite would seem to
be imposing a significant burden on the patch.

Currently all that we need from the C family of frontends is the
cpp_reader and the string concatenation records.  I think we can
reconstruct the cpp_reader if we have the options, though presumably
that's per TU, so to support all this we'd need to capture e.g. the per
-TU encoding information in the LTO records, for the case where one TU
is UTF-8 encoded source to UTF-8 execution, and another TU is EBCDIC
-encoded source to UCS-4 execution (or whatever).  And there's an issue
if different TUs compiled the same header with different encoding
options.

Or... we could not bother.  This is a Quality of Implementation thing,
for improving diagnostics, and in each case, the diagnostic is required
to cope with substring location information not being available (and
the code I posted in patch 2 of the kit makes it trivial to handle that
case from a diagnostic).  So we could simply have LTO use the
fallback mode.

There are two high-level approaches I've tried:

(a) capture the substring location information in the lexer/parser in
the frontend as it runs, and store it somehow.

(b) regenerate it "on-demand" when a diagnostic needs it.

Approach (b) is inherently going to be prone to the LTO issues you
describe, but it avoids adding to the CPU cycles/memory consumption for
the common case of not needing the information. [1]

Is approach (b) acceptable?

Thanks
Dave

[1] with the exception of the string concatenation records, but I
believe those are tiny

Re: [PATCH 1/3] (v2) On-demand locations within string-literals

Reply via email to