On Mon, Nov 03, 2025 at 12:34:15PM -0500, David Malcolm wrote:
> On Mon, 2025-11-03 at 18:13 +0100, Kamil Dudka wrote:
> > On Monday, 3 November 2025 17:28:54 CET David Malcolm wrote:
> > > On Mon, 2025-11-03 at 17:03 +0100, Kamil Dudka wrote:
> > > > I guess this is a deficiency of the SARIF reader in `csgrep`.  I
> > > > get
> > > > the same
> > > > output with csgrep locally:
> > > > 
> > > > % csgrep sscg-4.0.0-1.fc44/debug/raw-results/builddir/gcc-
> > > > results/452-M3DZ.sarif
> > > > 
> > > 
> > > Interesting.  Looking at 452-M3DZ.sarif  I notice that almost all
> > > of
> > > threadFlowLocation objects in the path have a "kinds" property as
> > > per
> > > 3.38.8 kinds property:
> > > https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141791009
> > > but the one for "(17) if ‘BIO_read’ throws an exception..."
> > > *doesn't*
> > > have a "kinds" property.  That's just a "MAY contain" property and
> > > gcc
> > > won't supply it if no properties are appropriate.  Is csgrep
> > > perhaps
> > > ignoring threadFlowLocation objects that don't have a "kinds"
> > > property?
> > 
> > That is exactly the issue.  If there is no "kind" specified, how
> > would csgrep
> > convert the event to the legacy plain-text format of GCC?
> > 
> > In fact, there is a draft pull request that would kind of fix this in
> > csdiff:
> > https://github.com/csutils/csdiff/pull/199/files
> > 
> > But it has never been merged because Snyk Code produces SARIF files
> > with too
> > many events without any useful properties provided for them.  I do
> > not think
> > that sarif-replay handles these files any better.  You can give it a
> > try
> > with this SARIF file, for example:
> > https://github.com/csutils/csdiff/blob/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> > 
> > You need to drop BOM because sarify-replay does not support them:
> > 
> > % curl -O
> > https://raw.githubusercontent.com/csutils/csdiff/refs/heads/main/tests/csgrep/0125-sarif-parser-bom-stdin.txt
> > % sarif-replay <(tail -c +4 0125-sarif-parser-bom-stdin.txt)
> 
> Gahh, thanks for spotting this.  The JSON spec says that
> "Implementations MUST NOT add a byte order mark (U+FEFF) to the
> beginning of a networked-transmitted JSON text" but it would be good
> for sarif-replay to gracefully handle this case; I've filed this for
> myself as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122546

I personally would argue that this is a serious bug in the producer, not
in the consumer. BOM has absolutely no purpose, none whatsoever, for
files that are *known* to be encoded in UTF-8. Extremely few programs
expect to ever see it, almost every program consuming UTF-8 files will
handle it incorrectly if it is present in the input file...

...and IMHO this is a good thing.

Of course, I could be prejudiced by the fact that

  https://www.ueber.net/who/mjl/projects/bomstrip/

was one of the very first programs I ever packaged for Debian :)

G'luck,
Peter

-- 
Peter Pentchev  [email protected] [email protected] [email protected]
PGP key:        https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115  C354 651E EFB0 2527 DF13

Attachment: signature.asc
Description: PGP signature

-- 
_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to