On Tue, 2025-11-04 at 14:38 +0100, Kamil Dudka wrote:
> On Monday, 3 November 2025 18:34:15 CET David Malcolm wrote:
> > On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote:
> > > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote:
> > > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote:
> > > > > I guess this is a deficiency of the SARIF reader in
> > > > > `csgrep`.  I
> > > > > get
> > > > > the same
> > > > > output with csgrep locally:
> > > > > 
> > > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc-
> > > > > results/452-M3DZ.sarif
> > > > > 
> > > > 
> > > > Interesting.  Looking at 452-M3DZ.sarif  I notice that almost
> > > > all
> > > > of
> > > > threadFlowLocation objects in the path have a "kinds" property
> > > > as
> > > > per
> > > > 3.38.8 kinds property:
> > > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009
> > > > but the one for "(17) if ‘BIO_read’ throws an exception..."
> > > > *doesn't*
> > > > have a "kinds" property.  That's just a "MAY contain" property
> > > > and
> > > > gcc
> > > > won't supply it if no properties are appropriate.  Is csgrep
> > > > perhaps
> > > > ignoring threadFlowLocation objects that don't have a "kinds"
> > > > property?
> > > 
> > > That is exactly the issue.  If there is no "kind" specified, how
> > > would csgrep
> > > convert the event to the legacy plain-text format of GCC?
> > > 
> > > In fact, there is a draft pull request that would kind of fix
> > > this in
> > > csdiff:
> > > https://github.com/csutils/csdiff/pull/199/files
> > > 
> > > But it has never been merged because Snyk Code produces SARIF
> > > files
> > > with too
> > > many events without any useful properties provided for them.  I
> > > do
> > > not think
> > > that sarif-replay handles these files any better.  You can give
> > > it a
> > > try
> > > with this SARIF file, for example:
> > > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> > > 
> > > You need to drop BOM because sarify-replay does not support them:
> > > 
> > > % curl -O
> > > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> > > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt)
> > 
> > Gahh, thanks for spotting this.  The JSON spec says that
> > "Implementations MUST NOT add a byte order mark (U+FEFF) to the
> > beginning of a networked-transmitted JSON text" but it would be
> > good
> > for sarif-replay to gracefully handle this case; I've filed this
> > for
> > myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546
> > 
> > 
> > > 
> > > > Arguably it could be good to have property values in the SARIF
> > > > spec
> > > > for
> > > > exception-handling; I've filed this here:
> > > > https://github.com/oasis-tcs/sarif-spec/issues/735
> > > 
> > > Thanks!
> > > 
> > > > > > Is it possible to use sarif-replay there? (I can help if
> > > > > > you
> > > > > > like).
> > > > > 
> > > > > Unfortunately, this is not going to be easy because csmock
> > > > > does
> > > > > not
> > > > > use
> > > > > SARIF internally.  It merges diagnostic output from all the
> > > > > enabled
> > > > > tools
> > > > > into a simple JSON format that throws away a lot of details.
> > > > 
> > > > Right, but in the generated HTML it looks like you're invoking
> > > > csgrep
> > > > to  read the SARIF and print it: presumably you could slot in
> > > > sarif-
> > > > replay and have that print it instead?  Or am I
> > > > misunderstanding
> > > > something?
> > > 
> > > Yes, csmock uses a non-standard intermediate JSON format.  It is
> > > much
> > > more
> > > lightweight format compared to SARIF, which is (or at least was)
> > > needed to
> > > process scan results with too many findings.  Even with the
> > > simple
> > > JSON
> > > format we hit severe performance issues, e.g. when the Gitleaks
> > > tool
> > > hit
> > > a test-suite of a crypto library:
> > > https://github.com/csutils/csdiff/pull/119
> > > https://github.com/csutils/csdiff/pull/134
> > 
> > Might it be possible to generate sarif from the csmock JSON
> > format?  Or
> > has too much information already been lost for the result to be
> > useful?
> 
> The csmock JSON format can be converted to a valid SARIF with csgrep
> but it does not contain the details that sarif-replay needs to
> produce
> a meaningful output:
> 
> % curl -s
> 'https://openscanhub.fedoraproject.org/task/78401/log/sscg-4.0.0-1.fc
> 44.tar.xz?format=raw' | xz -cd | tar -x sscg-4.0.0-1.fc44/debug/raw-
> results/builddir/gcc-results/452-M3DZ.sarif --to-command='csgrep --
> mode=sarif' > 452-M3DZ-csmock.sarif
> 
> The resulting SARIF renders fine with csgrep itself (and services
> like
> GitHub or Defect Dojo):
> 
> % csgrep 452-M3DZ-csmock.sarif
> Error: GCC_ANALYZER_WARNING (CWE-401):
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:118:3: warning[-Wanalyzer-malloc-leak]:
> leak of ‘ext_str’
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:61:6: branch_false: following ‘false’
> branch...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:69:26: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:70:6: branch_false: following ‘false’
> branch...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:77:14: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:78:6: branch_false: following ‘false’
> branch...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:85:9: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:86:6: branch_false: following ‘false’
> branch...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:93:8: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:93:6: branch_false: following ‘false’
> branch...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:101:13: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:102:6: branch_false: following ‘false’
> branch (when ‘ext_len > 0’)...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:109:21: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:109:13: acquire_memory: allocated here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:110:6: branch_false: following ‘false’
> branch (when ‘ext_str’ is non-NULL)...
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:118:3: branch_false: ...to here
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:118:3: danger: ‘ext_str’ leaks here;
> was allocated at
> [(13)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/12
> )
> 
> But it does not work well with sarif-replay at all:
> 
> % sarif-replay 452-M3DZ-csmock.sarif
> /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-
> 4.0.0/test/create_cert_test.c:118:3: warning: leak of ‘ext_str’
> [GCC_ANALYZER_WARNING: warning[-Wanalyzer-malloc-leak]]
>   event 1
>     │
>     │
>     └──> events 2-17
>            │
>            │......
>            │......
>            │......
>            │......
>            │......
>            │......
>            │......
>            │

Gahhh; sorry about this.  I think the issue here is that sarif-replay
can't find the source code, and my path-printing code isn't doing a
good job of handling that.

I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122622 for
myself to take a look at handling this case better.

In your workflow, is the source code still around when this runs (and
so if sarif-replay were given some kind of path to look relative to
this might be fixable [1]), or is it not available (e.g. on a different
machine, or run at some later time after cleanup of the build)?

FWIW gcc's SARIF output automatically embeds a copy of any file
mentioned anywhere in the SARIF, and sarif-replay will use this; see
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=778336e0e4f25745f76a127801dc3bab5e9c1334
for the details.  So the above problem doesn't show up on gcc SARIF
output (but this might be bloating the SARIF output you have to deal
with).

[...snip..]

Dave
[1] sarif-replay doesn't support such relative paths yet, but I can
implement it if it would be useful

-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to