On Mon, Nov 03, 2025 at 12:34:15PM -0500, David Malcolm wrote: > On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote: > > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote: > > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote: > > > > I guess this is a deficiency of the SARIF reader in `csgrep`. I > > > > get > > > > the same > > > > output with csgrep locally: > > > > > > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc- > > > > results/452-M3DZ.sarif > > > > > > > > > > Interesting. Looking at 452-M3DZ.sarif I notice that almost all > > > of > > > threadFlowLocation objects in the path have a "kinds" property as > > > per > > > 3.38.8 kinds property: > > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009 > > > but the one for "(17) if ‘BIO_read’ throws an exception..." > > > *doesn't* > > > have a "kinds" property. That's just a "MAY contain" property and > > > gcc > > > won't supply it if no properties are appropriate. Is csgrep > > > perhaps > > > ignoring threadFlowLocation objects that don't have a "kinds" > > > property? > > > > That is exactly the issue. If there is no "kind" specified, how > > would csgrep > > convert the event to the legacy plain-text format of GCC? > > > > In fact, there is a draft pull request that would kind of fix this in > > csdiff: > > https://github.com/csutils/csdiff/pull/199/files > > > > But it has never been merged because Snyk Code produces SARIF files > > with too > > many events without any useful properties provided for them. I do > > not think > > that sarif-replay handles these files any better. You can give it a > > try > > with this SARIF file, for example: > > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > > > You need to drop BOM because sarify-replay does not support them: > > > > % curl -O > > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt > > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt) > > Gahh, thanks for spotting this. The JSON spec says that > "Implementations MUST NOT add a byte order mark (U+FEFF) to the > beginning of a networked-transmitted JSON text" but it would be good > for sarif-replay to gracefully handle this case; I've filed this for > myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546
I personally would argue that this is a serious bug in the producer, not in the consumer. BOM has absolutely no purpose, none whatsoever, for files that are *known* to be encoded in UTF-8. Extremely few programs expect to ever see it, almost every program consuming UTF-8 files will handle it incorrectly if it is present in the input file... ...and IMHO this is a good thing. Of course, I could be prejudiced by the fact that https://www.ueber.net/who/mjl/projects/bomstrip/ was one of the very first programs I ever packaged for Debian :) G'luck, Peter -- Peter Pentchev [email protected] [email protected] [email protected] PGP key: https://www.ringlet.net/roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
signature.asc
Description: PGP signature
-- _______________________________________________ devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
