On Wed, May 14 2025, David Malcolm <dmalc...@redhat.com> wrote:

> On Thu, 2025-04-03 at 13:58 +0200, Rasmus Villemoes wrote:
>> In many setups, especially when CI and/or some meta-build system like
>> Yocto or buildroot, is involved, gcc ends up being invoked using
>> absolute path names, which are often long and uninteresting.
>> 
>> That amounts to a lot of noise both when trying to decipher the
>> warning or error, when the warning text is copy-pasted to a commit
>> fixing the issue, or posted to a mailing list.
>
> Various thoughts:
>
> The patch only touches "text" output, it doesn't affect "sarif" (or
> "json", but that's deprecated and I plan to remove it soon).  
>
> FWIW SARIF has some interesting support for redacting sensitive
> information; see e.g. 
> "3.5.2 Redactable strings"
> https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141790687
>
> and "3.14.28 redactionTokens
> property"https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141790762
>

Interesting. I didn't know that "SARIF" concept until just now, nor of
course that gcc could produce such output.

I also now see that there is a lot of existing -fdiagnostics-* options,
so if this is actually going to happen, it should probably be
-fdiagnostics-prefix-map and not the short form as in $subject.

> I don't know if we want to implement redaction within GCC's SARIF
> output code, or to simply defer that to 3rd party post-processing
> tools.

I think it's better to do it in (all) gcc output, as gcc has a better
chance of knowing what is actually a build path; otherwise it seems that
any post-processor would need to do more or less naive string
matching. And in many CI setups, the leading part of the build path can
be somewhat random (some docker volume mapped to some path including the
CI job ID or whatnot), so I think it might be hard to even know which
strings to replace.

> If we're going to punt on the issue of redaction, then one approach
> here might be to say that the new option only affects text output.  I'm
> not sure here.  Is redaction the only use-case, or is this also about
> cleaning up long and irrelevant paths in CI output?

It's really both. For redaction purposes, one will have to be really
careful anyway (because the errors one quotes might as well come from
Make or some other part of the build system). But as a completely random
example found online, see
https://github.com/openembedded/meta-openembedded/commit/21315bd00b2a1dcdf532929992c90496b8425192
; everything up to and including /git/ in that path is irrelevant, and
it would be nice if it simply wasn't there to begin with so simple
copy-pasting would give a nicer commit message. And even when not used
for a commit, just reading the CI output is easier without that long
leading path.

>
> [...snip...]
>
>>  
>>    const char *line_col = maybe_line_and_column (line, col);
>> diff --git a/gcc/file-prefix-map.cc b/gcc/file-prefix-map.cc
>> index 3a77b195ae3..7ae3e7f95d5 100644
>> --- a/gcc/file-prefix-map.cc
>> +++ b/gcc/file-prefix-map.cc
>
> [...snip...]
>
>> +
>> +/* Remap using -fdiag-prefix-map.  Return the GC-allocated new name
>> +   corresponding to FILENAME or FILENAME if no remapping was
>> performed.  */
>> +const char *
>> +remap_diag_filename (const char *filename)
>> +{
>> +  return remap_filename (diag_prefix_maps, filename);
>> +}
>
> The returned string is GC-allocated, via ggc_internal_alloc.  For host
> binaries linked against ggc-page.o this buffer will eventually be
> garbage-collected, but for host binaries linked against ggc-none.o this
> is a memory leak.  Perhaps the remap_filename code could simply return
> a std::string?

Perhaps, that is beyond my skills in gcc internals. I merely copy-pasted
as much as I could from the existing remapping, but then also had to do
a bit of Makefile juggling to make the result link.

But wouldn't such a change require lots of changes in the callers of the
one-liner functions defined in terms of remap_filename? Or would you
change those three functions to copy the .c_str to a GC'ed allocation as
they do now and only return the std::string from the new
remap_diag_filename?

Rasmus

Reply via email to