On Monday, 3 November 2025 18:34:15 CET David Malcolm wrote:
> On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote:
> > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote:
> > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote:
> > > > I guess this is a deficiency of the SARIF reader in `csgrep`.  I
> > > > get
> > > > the same
> > > > output with csgrep locally:
> > > > 
> > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc-
> > > > results/452-M3DZ.sarif
> > > > 
> > > 
> > > Interesting.  Looking at 452-M3DZ.sarif  I notice that almost all
> > > of
> > > threadFlowLocation objects in the path have a "kinds" property as
> > > per
> > > 3.38.8 kinds property:
> > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009
> > > but the one for "(17) if ‘BIO_read’ throws an exception..."
> > > *doesn't*
> > > have a "kinds" property.  That's just a "MAY contain" property and
> > > gcc
> > > won't supply it if no properties are appropriate.  Is csgrep
> > > perhaps
> > > ignoring threadFlowLocation objects that don't have a "kinds"
> > > property?
> > 
> > That is exactly the issue.  If there is no "kind" specified, how
> > would csgrep
> > convert the event to the legacy plain-text format of GCC?
> > 
> > In fact, there is a draft pull request that would kind of fix this in
> > csdiff:
> > https://github.com/csutils/csdiff/pull/199/files
> > 
> > But it has never been merged because Snyk Code produces SARIF files
> > with too
> > many events without any useful properties provided for them.  I do
> > not think
> > that sarif-replay handles these files any better.  You can give it a
> > try
> > with this SARIF file, for example:
> > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> > 
> > You need to drop BOM because sarify-replay does not support them:
> > 
> > % curl -O
> > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt)
> 
> Gahh, thanks for spotting this.  The JSON spec says that
> "Implementations MUST NOT add a byte order mark (U+FEFF) to the
> beginning of a networked-transmitted JSON text" but it would be good
> for sarif-replay to gracefully handle this case; I've filed this for
> myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546
> 
> 
> > 
> > > Arguably it could be good to have property values in the SARIF spec
> > > for
> > > exception-handling; I've filed this here:
> > > https://github.com/oasis-tcs/sarif-spec/issues/735
> > 
> > Thanks!
> > 
> > > > > Is it possible to use sarif-replay there? (I can help if you
> > > > > like).
> > > > 
> > > > Unfortunately, this is not going to be easy because csmock does
> > > > not
> > > > use
> > > > SARIF internally.  It merges diagnostic output from all the
> > > > enabled
> > > > tools
> > > > into a simple JSON format that throws away a lot of details.
> > > 
> > > Right, but in the generated HTML it looks like you're invoking
> > > csgrep
> > > to  read the SARIF and print it: presumably you could slot in
> > > sarif-
> > > replay and have that print it instead?  Or am I misunderstanding
> > > something?
> > 
> > Yes, csmock uses a non-standard intermediate JSON format.  It is much
> > more
> > lightweight format compared to SARIF, which is (or at least was)
> > needed to
> > process scan results with too many findings.  Even with the simple
> > JSON
> > format we hit severe performance issues, e.g. when the Gitleaks tool
> > hit
> > a test-suite of a crypto library:
> > https://github.com/csutils/csdiff/pull/119
> > https://github.com/csutils/csdiff/pull/134
> 
> Might it be possible to generate sarif from the csmock JSON format?  Or
> has too much information already been lost for the result to be useful?

The csmock JSON format can be converted to a valid SARIF with csgrep
but it does not contain the details that sarif-replay needs to produce
a meaningful output:

% curl -s 
'https://openscanhub.fedoraproject.org/task/78401/log/sscg-4.0.0-1.fc44.tar.xz?format=raw'
 | xz -cd | tar -x 
sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc-results/452-M3DZ.sarif 
--to-command='csgrep --mode=sarif' > 452-M3DZ-csmock.sarif

The resulting SARIF renders fine with csgrep itself (and services like
GitHub or Defect Dojo):

% csgrep 452-M3DZ-csmock.sarif
Error: GCC_ANALYZER_WARNING (CWE-401):
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3:
 warning[-Wanalyzer-malloc-leak]: leak of ‘ext_str’
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:61:6:
 branch_false: following ‘false’ branch...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:69:26:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:70:6:
 branch_false: following ‘false’ branch...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:77:14:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:78:6:
 branch_false: following ‘false’ branch...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:85:9:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:86:6:
 branch_false: following ‘false’ branch...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:93:8:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:93:6:
 branch_false: following ‘false’ branch...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:101:13:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:102:6:
 branch_false: following ‘false’ branch (when ‘ext_len > 0’)...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:109:21:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:109:13:
 acquire_memory: allocated here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:110:6:
 branch_false: following ‘false’ branch (when ‘ext_str’ is non-NULL)...
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3:
 branch_false: ...to here
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3:
 danger: ‘ext_str’ leaks here; was allocated at 
[(13)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/12)

But it does not work well with sarif-replay at all:

% sarif-replay 452-M3DZ-csmock.sarif
/builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3:
 warning: leak of ‘ext_str’ [GCC_ANALYZER_WARNING: 
warning[-Wanalyzer-malloc-leak]]
  event 1
    │
    │
    └──> events 2-17
           │
           │......
           │......
           │......
           │......
           │......
           │......
           │......
           │

> FWIW I experimented with CBOR as an alternate encoding for SARIF, but
> the savings I got didn't seem worth the bother.  Perhaps something like
> protobuffers (or CapnProto) might be worth looking into (but I don't
> have time for that in GCC 16)

When we were processing the overall results of a RHEL-9 mass scan (merged
from all scanned RPM packages), it took minutes to process the simple JSON
format.  csgrep uses <boost/property_tree/json_parser.hpp>, which needs to
read the whole JSON tree into memory before it can be accessed.  I do not
think this would work well with SARIF because of the cross-references (unless
the mentioned formats implement stream processing or lazy loading of nodes).
Defect Dojo uses a database to hold the data but it did not work well with
this amount of data either.

Kamil

> Dave
> 
> > 
> > > > In theory, csmock could be reworked to use SARIF internally but
> > > > it
> > > > would
> > > > not be a smooth transition and it would introduce a performance
> > > > penalty
> > > > on scans with a lot of findings in the results.
> > > > 
> > > > > But note that GCC 16 and its
> > > > > sarif-replay also has HTML output now, as per:
> > > > >  
> > > > > https://dmalcolm.fedorapeople.org/gcc/2025-11-03/452-M3DZ.html
> > > > > below (and potentially can show extra debugging info when
> > > > > stepping
> > > > > though the code with "j" and "k" if you supply "state-
> > > > > graphs=yes"
> > > > > when
> > > > > adding the SARIF sink):
> > > > >   -fdiagnostics-add-output=sarif:state-graphs=yes
> > > > > see the example at:
> > > > > https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html
> > > > 
> > > > This is cool.  I like the visualization.
> > > > 
> > > > > Dave
> > > > > 
> > > > > > 
> > > > > > Looking in the SARIF file, I see "-fexceptions" in the
> > > > > > command-
> > > > > > line
> > > > > > flags.  Is that deliberate?  Is this code meant to support
> > > > > > exceptions?
> > > > 
> > > > Good point.  Stephen, could you please comment on that?
> > > 
> > > I've been advising people not to use -fanalyzer with GCC 15 and
> > > earlier
> > > on C++ code because of exception-handling problems.  I suspect
> > > similar
> > > problems could arise on C code compiled with -fexceptions.
> > 
> > I see.  OpenScanHub does not (deliberately) use GCC Analyzer for C++
> > yet.


-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to