On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote: > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote: > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote: > > > I guess this is a deficiency of the SARIF reader in `csgrep`. I > > > get > > > the same > > > output with csgrep locally: > > > > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc- > > > results/452-M3DZ.sarif > > > > > > > Interesting. Looking at 452-M3DZ.sarif I notice that almost all > > of > > threadFlowLocation objects in the path have a "kinds" property as > > per > > 3.38.8 kinds property: > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009 > > but the one for "(17) if ‘BIO_read’ throws an exception..." > > *doesn't* > > have a "kinds" property. That's just a "MAY contain" property and > > gcc > > won't supply it if no properties are appropriate. Is csgrep > > perhaps > > ignoring threadFlowLocation objects that don't have a "kinds" > > property? > > That is exactly the issue. If there is no "kind" specified, how > would csgrep > convert the event to the legacy plain-text format of GCC? > > In fact, there is a draft pull request that would kind of fix this in > csdiff: > https://github.com/csutils/csdiff/pull/199/files > > But it has never been merged because Snyk Code produces SARIF files > with too > many events without any useful properties provided for them. I do > not think > that sarif-replay handles these files any better. You can give it a > try > with this SARIF file, for example: > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > You need to drop BOM because sarify-replay does not support them: > > % curl -O > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt)
Gahh, thanks for spotting this. The JSON spec says that "Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text" but it would be good for sarif-replay to gracefully handle this case; I've filed this for myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546 > > > Arguably it could be good to have property values in the SARIF spec > > for > > exception-handling; I've filed this here: > > https://github.com/oasis-tcs/sarif-spec/issues/735 > > Thanks! > > > > > Is it possible to use sarif-replay there? (I can help if you > > > > like). > > > > > > Unfortunately, this is not going to be easy because csmock does > > > not > > > use > > > SARIF internally. It merges diagnostic output from all the > > > enabled > > > tools > > > into a simple JSON format that throws away a lot of details. > > > > Right, but in the generated HTML it looks like you're invoking > > csgrep > > to read the SARIF and print it: presumably you could slot in > > sarif- > > replay and have that print it instead? Or am I misunderstanding > > something? > > Yes, csmock uses a non-standard intermediate JSON format. It is much > more > lightweight format compared to SARIF, which is (or at least was) > needed to > process scan results with too many findings. Even with the simple > JSON > format we hit severe performance issues, e.g. when the Gitleaks tool > hit > a test-suite of a crypto library: > https://github.com/csutils/csdiff/pull/119 > https://github.com/csutils/csdiff/pull/134 Might it be possible to generate sarif from the csmock JSON format? Or has too much information already been lost for the result to be useful? FWIW I experimented with CBOR as an alternate encoding for SARIF, but the savings I got didn't seem worth the bother. Perhaps something like protobuffers (or CapnProto) might be worth looking into (but I don't have time for that in GCC 16) Dave > > > > In theory, csmock could be reworked to use SARIF internally but > > > it > > > would > > > not be a smooth transition and it would introduce a performance > > > penalty > > > on scans with a lot of findings in the results. > > > > > > > But note that GCC 16 and its > > > > sarif-replay also has HTML output now, as per: > > > > > > > > https://dmalcolm.fedorapeople.org/gcc/2025-11-03/452-M3DZ.html > > > > below (and potentially can show extra debugging info when > > > > stepping > > > > though the code with "j" and "k" if you supply "state- > > > > graphs=yes" > > > > when > > > > adding the SARIF sink): > > > > -fdiagnostics-add-output=sarif:state-graphs=yes > > > > see the example at: > > > > https://dmalcolm.fedorapeople.org/gcc/2025-06-23/state-diagram-1.c.html > > > > > > This is cool. I like the visualization. > > > > > > > Dave > > > > > > > > > > > > > > Looking in the SARIF file, I see "-fexceptions" in the > > > > > command- > > > > > line > > > > > flags. Is that deliberate? Is this code meant to support > > > > > exceptions? > > > > > > Good point. Stephen, could you please comment on that? > > > > I've been advising people not to use -fanalyzer with GCC 15 and > > earlier > > on C++ code because of exception-handling problems. I suspect > > similar > > problems could arise on C code compiled with -fexceptions. > > I see. OpenScanHub does not (deliberately) use GCC Analyzer for C++ > yet. -- _______________________________________________ devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
