On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote:
> On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote:
> > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote:
> > > I guess this is a deficiency of the SARIF reader in `csgrep`.  I
> > > get
> > > the same
> > > output with csgrep locally:
> > > 
> > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc-
> > > results/452-M3DZ.sarif
> > > 
> > 
> > Interesting.  Looking at 452-M3DZ.sarif  I notice that almost all
> > of
> > threadFlowLocation objects in the path have a "kinds" property as
> > per
> > 3.38.8 kinds property:
> > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009
> > but the one for "(17) if ‘BIO_read’ throws an exception..."
> > *doesn't*
> > have a "kinds" property.  That's just a "MAY contain" property and
> > gcc
> > won't supply it if no properties are appropriate.  Is csgrep
> > perhaps
> > ignoring threadFlowLocation objects that don't have a "kinds"
> > property?
> 
> That is exactly the issue.  If there is no "kind" specified, how
> would csgrep
> convert the event to the legacy plain-text format of GCC?
> 
> In fact, there is a draft pull request that would kind of fix this in
> csdiff:
> https://github.com/csutils/csdiff/pull/199/files
> 
> But it has never been merged because Snyk Code produces SARIF files
> with too
> many events without any useful properties provided for them.  I do
> not think
> that sarif-replay handles these files any better.  You can give it a
> try
> with this SARIF file, for example:
> https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> 
> You need to drop BOM because sarify-replay does not support them:
> 
> % curl -O
> https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt)

Gahh, thanks for spotting this.  The JSON spec says that
"Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a networked-transmitted JSON text" but it would be good
for sarif-replay to gracefully handle this case; I've filed this for
myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546


> 
> > Arguably it could be good to have property values in the SARIF spec
> > for
> > exception-handling; I've filed this here:
> > https://github.com/oasis-tcs/sarif-spec/issues/735
> 
> Thanks!
> 
> > > > Is it possible to use sarif-replay there? (I can help if you
> > > > like).
> > > 
> > > Unfortunately, this is not going to be easy because csmock does
> > > not
> > > use
> > > SARIF internally.  It merges diagnostic output from all the
> > > enabled
> > > tools
> > > into a simple JSON format that throws away a lot of details.
> > 
> > Right, but in the generated HTML it looks like you're invoking
> > csgrep
> > to  read the SARIF and print it: presumably you could slot in
> > sarif-
> > replay and have that print it instead?  Or am I misunderstanding
> > something?
> 
> Yes, csmock uses a non-standard intermediate JSON format.  It is much
> more
> lightweight format compared to SARIF, which is (or at least was)
> needed to
> process scan results with too many findings.  Even with the simple
> JSON
> format we hit severe performance issues, e.g. when the Gitleaks tool
> hit
> a test-suite of a crypto library:
> https://github.com/csutils/csdiff/pull/119
> https://github.com/csutils/csdiff/pull/134

Might it be possible to generate sarif from the csmock JSON format?  Or
has too much information already been lost for the result to be useful?

FWIW I experimented with CBOR as an alternate encoding for SARIF, but
the savings I got didn't seem worth the bother.  Perhaps something like
protobuffers (or CapnProto) might be worth looking into (but I don't
have time for that in GCC 16)

Dave

> 
> > > In theory, csmock could be reworked to use SARIF internally but
> > > it
> > > would
> > > not be a smooth transition and it would introduce a performance
> > > penalty
> > > on scans with a lot of findings in the results.
> > > 
> > > > But note that GCC 16 and its
> > > > sarif-replay also has HTML output now, as per:
> > > >  
> > > > https://dmalcolm.fedorapeople.org/gcc/2025-11-03/452-M3DZ.html
> > > > below (and potentially can show extra debugging info when
> > > > stepping
> > > > though the code with "j" and "k" if you supply "state-
> > > > graphs=yes"
> > > > when
> > > > adding the SARIF sink):
> > > >   -fdiagnostics-add-output=sarif:state-graphs=yes
> > > > see the example at:
> > > > https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html
> > > 
> > > This is cool.  I like the visualization.
> > > 
> > > > Dave
> > > > 
> > > > > 
> > > > > Looking in the SARIF file, I see "-fexceptions" in the
> > > > > command-
> > > > > line
> > > > > flags.  Is that deliberate?  Is this code meant to support
> > > > > exceptions?
> > > 
> > > Good point.  Stephen, could you please comment on that?
> > 
> > I've been advising people not to use -fanalyzer with GCC 15 and
> > earlier
> > on C++ code because of exception-handling problems.  I suspect
> > similar
> > problems could arise on C code compiled with -fexceptions.
> 
> I see.  OpenScanHub does not (deliberately) use GCC Analyzer for C++
> yet.

-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to