On Monday, 3 November 2025 18:34:15 CET David Malcolm wrote: > On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote: > > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote: > > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote: > > > > I guess this is a deficiency of the SARIF reader in `csgrep`. I > > > > get > > > > the same > > > > output with csgrep locally: > > > > > > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc- > > > > results/452-M3DZ.sarif > > > > > > > > > > Interesting. Looking at 452-M3DZ.sarif I notice that almost all > > > of > > > threadFlowLocation objects in the path have a "kinds" property as > > > per > > > 3.38.8 kinds property: > > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009 > > > but the one for "(17) if ‘BIO_read’ throws an exception..." > > > *doesn't* > > > have a "kinds" property. That's just a "MAY contain" property and > > > gcc > > > won't supply it if no properties are appropriate. Is csgrep > > > perhaps > > > ignoring threadFlowLocation objects that don't have a "kinds" > > > property? > > > > That is exactly the issue. If there is no "kind" specified, how > > would csgrep > > convert the event to the legacy plain-text format of GCC? > > > > In fact, there is a draft pull request that would kind of fix this in > > csdiff: > > https://github.com/csutils/csdiff/pull/199/files > > > > But it has never been merged because Snyk Code produces SARIF files > > with too > > many events without any useful properties provided for them. I do > > not think > > that sarif-replay handles these files any better. You can give it a > > try > > with this SARIF file, for example: > > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > > > You need to drop BOM because sarify-replay does not support them: > > > > % curl -O > > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt) > > Gahh, thanks for spotting this. The JSON spec says that > "Implementations MUST NOT add a byte order mark (U+FEFF) to the > beginning of a networked-transmitted JSON text" but it would be good > for sarif-replay to gracefully handle this case; I've filed this for > myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546 > > > > > > > Arguably it could be good to have property values in the SARIF spec > > > for > > > exception-handling; I've filed this here: > > > https://github.com/oasis-tcs/sarif-spec/issues/735 > > > > Thanks! > > > > > > > Is it possible to use sarif-replay there? (I can help if you > > > > > like). > > > > > > > > Unfortunately, this is not going to be easy because csmock does > > > > not > > > > use > > > > SARIF internally. It merges diagnostic output from all the > > > > enabled > > > > tools > > > > into a simple JSON format that throws away a lot of details. > > > > > > Right, but in the generated HTML it looks like you're invoking > > > csgrep > > > to read the SARIF and print it: presumably you could slot in > > > sarif- > > > replay and have that print it instead? Or am I misunderstanding > > > something? > > > > Yes, csmock uses a non-standard intermediate JSON format. It is much > > more > > lightweight format compared to SARIF, which is (or at least was) > > needed to > > process scan results with too many findings. Even with the simple > > JSON > > format we hit severe performance issues, e.g. when the Gitleaks tool > > hit > > a test-suite of a crypto library: > > https://github.com/csutils/csdiff/pull/119 > > https://github.com/csutils/csdiff/pull/134 > > Might it be possible to generate sarif from the csmock JSON format? Or > has too much information already been lost for the result to be useful?
The csmock JSON format can be converted to a valid SARIF with csgrep but it does not contain the details that sarif-replay needs to produce a meaningful output: % curl -s 'https://openscanhub.fedoraproject.org/task/78401/log/sscg-4.0.0-1.fc44.tar.xz?format=raw' | xz -cd | tar -x sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc-results/452-M3DZ.sarif --to-command='csgrep --mode=sarif' > 452-M3DZ-csmock.sarif The resulting SARIF renders fine with csgrep itself (and services like GitHub or Defect Dojo): % csgrep 452-M3DZ-csmock.sarif Error: GCC_ANALYZER_WARNING (CWE-401): /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3: warning[-Wanalyzer-malloc-leak]: leak of ‘ext_str’ /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:61:6: branch_false: following ‘false’ branch... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:69:26: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:70:6: branch_false: following ‘false’ branch... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:77:14: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:78:6: branch_false: following ‘false’ branch... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:85:9: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:86:6: branch_false: following ‘false’ branch... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:93:8: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:93:6: branch_false: following ‘false’ branch... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:101:13: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:102:6: branch_false: following ‘false’ branch (when ‘ext_len > 0’)... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:109:21: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:109:13: acquire_memory: allocated here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:110:6: branch_false: following ‘false’ branch (when ‘ext_str’ is non-NULL)... /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3: branch_false: ...to here /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3: danger: ‘ext_str’ leaks here; was allocated at [(13)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/12) But it does not work well with sarif-replay at all: % sarif-replay 452-M3DZ-csmock.sarif /builddir/build/BUILD/sscg-4.0.0-build/sscg-sscg-4.0.0/test/create_cert_test.c:118:3: warning: leak of ‘ext_str’ [GCC_ANALYZER_WARNING: warning[-Wanalyzer-malloc-leak]] event 1 │ │ └──> events 2-17 │ │...... │...... │...... │...... │...... │...... │...... │ > FWIW I experimented with CBOR as an alternate encoding for SARIF, but > the savings I got didn't seem worth the bother. Perhaps something like > protobuffers (or CapnProto) might be worth looking into (but I don't > have time for that in GCC 16) When we were processing the overall results of a RHEL-9 mass scan (merged from all scanned RPM packages), it took minutes to process the simple JSON format. csgrep uses <boost/property_tree/json_parser.hpp>, which needs to read the whole JSON tree into memory before it can be accessed. I do not think this would work well with SARIF because of the cross-references (unless the mentioned formats implement stream processing or lazy loading of nodes). Defect Dojo uses a database to hold the data but it did not work well with this amount of data either. Kamil > Dave > > > > > > > In theory, csmock could be reworked to use SARIF internally but > > > > it > > > > would > > > > not be a smooth transition and it would introduce a performance > > > > penalty > > > > on scans with a lot of findings in the results. > > > > > > > > > But note that GCC 16 and its > > > > > sarif-replay also has HTML output now, as per: > > > > > > > > > > https://dmalcolm.fedorapeople.org/gcc/2025-11-03/452-M3DZ.html > > > > > below (and potentially can show extra debugging info when > > > > > stepping > > > > > though the code with "j" and "k" if you supply "state- > > > > > graphs=yes" > > > > > when > > > > > adding the SARIF sink): > > > > > -fdiagnostics-add-output=sarif:state-graphs=yes > > > > > see the example at: > > > > > https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html > > > > > > > > This is cool. I like the visualization. > > > > > > > > > Dave > > > > > > > > > > > > > > > > > Looking in the SARIF file, I see "-fexceptions" in the > > > > > > command- > > > > > > line > > > > > > flags. Is that deliberate? Is this code meant to support > > > > > > exceptions? > > > > > > > > Good point. Stephen, could you please comment on that? > > > > > > I've been advising people not to use -fanalyzer with GCC 15 and > > > earlier > > > on C++ code because of exception-handling problems. I suspect > > > similar > > > problems could arise on C code compiled with -fexceptions. > > > > I see. OpenScanHub does not (deliberately) use GCC Analyzer for C++ > > yet. -- _______________________________________________ devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
