On Thu, 2016-07-28 at 15:16 -0600, Martin Sebor wrote: > On 07/28/2016 02:38 PM, Martin Sebor wrote: > > On 07/28/2016 02:12 PM, David Malcolm wrote: > > > On Wed, 2016-07-27 at 23:41 +0100, Manuel López-Ibáñez wrote: > > > > On 27 July 2016 at 15:30, David Malcolm <dmalc...@redhat.com> > > > > wrote: > > > > > > Perhaps it could live for now in c-format.c, since it is > > > > > > the only > > > > > > place using it? > > > > > > > > > > Martin Sebor [CC-ed] wants to use it from the middle-end: > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01088.html > > > > > so it's unclear to me that c-format.c would be a better > > > > > location. > > > > > > > > Fine. He will have to figure out how to get a cpp_reader from > > > > the > > > > middle-end, though. > > > > > > It seems to me that on-demand reconstruction of source locations > > > for > > > STRING_CST nodes is inherently frontend-specific: unless we have > > > the > > > frontend record the information in some fe-independent way (which > > > I > > > assume we *don't* want to do, for space-efficiency), we need to > > > be able > > > to effectively re-run part of the frontend. > > > > > > So maybe this needs to be a langhook; the c-family can use the > > > global > > > cpp_reader * there, and everything else can return a "not > > > supported" > > > code if a diagnostic requests substring location information (and > > > the > > > diagnostic needs to be able to cope with that). > > > > The problem with the lanhook approach, as I learned from my first > > -Wformat-length attempt, is that it doesn't make the front end > > implementation available to LTO. So passes that run late enough > > with LTO (like the latest version of the -Wformat-length pass > > does) would not be bale to make use of it. > > I'm sorry, I didn't mean to sound like I was dismissing the idea. > I agree that string processing is language and front-end specific. > Having the middle end call back into the front-end also seems like > the right thing to do, not just to make this case work, but others > like it as well. So perhaps the problem to solve is how to teach > LTO to talk to the front end. One way to do it would be to build > the front ends as shared libraries.
Turning frontends into shared libraries as a prerequisite would seem to be imposing a significant burden on the patch. Currently all that we need from the C family of frontends is the cpp_reader and the string concatenation records. I think we can reconstruct the cpp_reader if we have the options, though presumably that's per TU, so to support all this we'd need to capture e.g. the per -TU encoding information in the LTO records, for the case where one TU is UTF-8 encoded source to UTF-8 execution, and another TU is EBCDIC -encoded source to UCS-4 execution (or whatever). And there's an issue if different TUs compiled the same header with different encoding options. Or... we could not bother. This is a Quality of Implementation thing, for improving diagnostics, and in each case, the diagnostic is required to cope with substring location information not being available (and the code I posted in patch 2 of the kit makes it trivial to handle that case from a diagnostic). So we could simply have LTO use the fallback mode. There are two high-level approaches I've tried: (a) capture the substring location information in the lexer/parser in the frontend as it runs, and store it somehow. (b) regenerate it "on-demand" when a diagnostic needs it. Approach (b) is inherently going to be prone to the LTO issues you describe, but it avoids adding to the CPU cycles/memory consumption for the common case of not needing the information. [1] Is approach (b) acceptable? Thanks Dave [1] with the exception of the string concatenation records, but I believe those are tiny