On Tue, 2025-11-04 at 14:38 +0100, Kamil Dudka wrote: > On Monday, 3 November 2025 18:34:15 CET David Malcolm wrote: > > On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote: > > > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote: > > > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote: > > > > > I guess this is a deficiency of the SARIF reader in > > > > > `csgrep`. I > > > > > get > > > > > the same > > > > > output with csgrep locally: > > > > > > > > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc- > > > > > results/452-M3DZ.sarif > > > > > > > > > > > > > Interesting. Looking at 452-M3DZ.sarif I notice that almost > > > > all > > > > of > > > > threadFlowLocation objects in the path have a "kinds" property > > > > as > > > > per > > > > 3.38.8 kinds property: > > > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009 > > > > but the one for "(17) if ‘BIO_read’ throws an exception..." > > > > *doesn't* > > > > have a "kinds" property. That's just a "MAY contain" property > > > > and > > > > gcc > > > > won't supply it if no properties are appropriate. Is csgrep > > > > perhaps > > > > ignoring threadFlowLocation objects that don't have a "kinds" > > > > property? > > > > > > That is exactly the issue. If there is no "kind" specified, how > > > would csgrep > > > convert the event to the legacy plain-text format of GCC? > > > > > > In fact, there is a draft pull request that would kind of fix > > > this in > > > csdiff: > > > https://github.com/csutils/csdiff/pull/199/files > > > > > > But it has never been merged because Snyk Code produces SARIF > > > files > > > with too > > > many events without any useful properties provided for them. I > > > do > > > not think > > > that sarif-replay handles these files any better. You can give > > > it a > > > try > > > with this SARIF file, for example: > > > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > > > > > You need to drop BOM because sarify-replay does not support them: > > > > > > % curl -O > > > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt) > > > > Gahh, thanks for spotting this. The JSON spec says that > > "Implementations MUST NOT add a byte order mark (U+FEFF) to the > > beginning of a networked-transmitted JSON text" but it would be > > good > > for sarif-replay to gracefully handle this case; I've filed this > > for > > myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546 > > > > > > > > > > > Arguably it could be good to have property values in the SARIF > > > > spec > > > > for > > > > exception-handling; I've filed this here: > > > > https://github.com/oasis-tcs/sarif-spec/issues/735 > > > > > > Thanks! > > > > > > > > > Is it possible to use sarif-replay there? (I can help if > > > > > > you > > > > > > like). > > > > > > > > > > Unfortunately, this is not going to be easy because csmock > > > > > does > > > > > not > > > > > use > > > > > SARIF internally. It merges diagnostic output from all the > > > > > enabled > > > > > tools > > > > > into a simple JSON format that throws away a lot of details. > > > > > > > > Right, but in the generated HTML it looks like you're invoking > > > > csgrep > > > > to read the SARIF and print it: presumably you could slot in > > > > sarif- > > > > replay and have that print it instead? Or am I > > > > misunderstanding > > > > something? > > > > > > Yes, csmock uses a non-standard intermediate JSON format. It is > > > much > > > more > > > lightweight format compared to SARIF, which is (or at least was) > > > needed to > > > process scan results with too many findings. Even with the > > > simple > > > JSON > > > format we hit severe performance issues, e.g. when the Gitleaks > > > tool > > > hit > > > a test-suite of a crypto library: > > > https://github.com/csutils/csdiff/pull/119 > > > https://github.com/csutils/csdiff/pull/134 > > > > Might it be possible to generate sarif from the csmock JSON > > format? Or > > has too much information already been lost for the result to be > > useful? > > The csmock JSON format can be converted to a valid SARIF with csgrep > but it does not contain the details that sarif-replay needs to > produce > a meaningful output: > > % curl -s > 'https://openscanhub.fedoraproject.org/task/78401/log/sscg-4.0.0-1.fc > 44.tar.xz?format=raw' | xz -cd | tar -x sscg-4.0.0-1.fc44/debug/raw- > results/builddir/gcc-results/452-M3DZ.sarif --to-command='csgrep -- > mode=sarif' > 452-M3DZ-csmock.sarif > > The resulting SARIF renders fine with csgrep itself (and services > like > GitHub or Defect Dojo): > > % csgrep 452-M3DZ-csmock.sarif > Error: GCC_ANALYZER_WARNING (CWE-401): > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:118:3: warning[-Wanalyzer-malloc-leak]: > leak of ‘ext_str’ > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:61:6: branch_false: following ‘false’ > branch... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:69:26: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:70:6: branch_false: following ‘false’ > branch... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:77:14: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:78:6: branch_false: following ‘false’ > branch... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:85:9: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:86:6: branch_false: following ‘false’ > branch... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:93:8: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:93:6: branch_false: following ‘false’ > branch... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:101:13: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:102:6: branch_false: following ‘false’ > branch (when ‘ext_len > 0’)... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:109:21: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:109:13: acquire_memory: allocated here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:110:6: branch_false: following ‘false’ > branch (when ‘ext_str’ is non-NULL)... > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:118:3: branch_false: ...to here > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:118:3: danger: ‘ext_str’ leaks here; > was allocated at > [(13)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/12 > ) > > But it does not work well with sarif-replay at all: > > % sarif-replay 452-M3DZ-csmock.sarif > /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg- > 4.0.0/test/create_cert_test.c:118:3: warning: leak of ‘ext_str’ > [GCC_ANALYZER_WARNING: warning[-Wanalyzer-malloc-leak]] > event 1 > │ > │ > └──> events 2-17 > │ > │...... > │...... > │...... > │...... > │...... > │...... > │...... > │
Gahhh; sorry about this. I think the issue here is that sarif-replay can't find the source code, and my path-printing code isn't doing a good job of handling that. I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122622 for myself to take a look at handling this case better. In your workflow, is the source code still around when this runs (and so if sarif-replay were given some kind of path to look relative to this might be fixable [1]), or is it not available (e.g. on a different machine, or run at some later time after cleanup of the build)? FWIW gcc's SARIF output automatically embeds a copy of any file mentioned anywhere in the SARIF, and sarif-replay will use this; see https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=778336e0e4f25745f76a127801dc3bab5e9c1334 for the details. So the above problem doesn't show up on gcc SARIF output (but this might be bloating the SARIF output you have to deal with). [...snip..] Dave [1] sarif-replay doesn't support such relative paths yet, but I can implement it if it would be useful -- _______________________________________________ devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
