On Fri, 2026-01-30 at 14:57 -0500, David Malcolm wrote: > On Fri, 2026-01-30 at 16:27 +0000, Qing Zhao wrote: > > Hi, David, > > > > Thanks a lot for your information. They are very interesting and > > promising. > > Thanks. > > > > > I do have two questions: > > > > 1. When you wrote the prototype that embeds SARIF as an ELF > > section, > > did you collect > > any data on the code size increase of the final object files? > > I didn't collect realistic data. > > FWIW I've uploaded the patch I had to > https://dmalcolm.fedorapeople.org/gcc/2026-01-30/0001-Initial-proof-of-concept-of-writing-sarif-to-asm-plu.patch > but it's heavily bit-rotted against trunk. > > By way of example, in the same directory is a test.s and a test.o > generated using the patch on a trivial C file: > > $ cat test.c > int i; > static int j; > > $ ./cc1 -quiet test.c \ > -fdiagnostics-add- > output=sarif:section=.sarif.json,serialization=json \ > -fdiagnostics-add- > output=sarif:section=.sarif.json5,serialization=json5 \ > -fdiagnostics-add- > output=sarif:section=.sarif.cbor,serialization=cbor \ > -o test.s \ > -Wall > > test.c:2:12: warning: ‘j’ defined but not used [-Wunused-variable] > 2 | static int j; > | ^ > > $ as test.s -o test.o > > $ for s in json json5 cbor ; do eu-readelf test.o -x .sarif.$s | head > ; done > Hex dump of section [7] '.sarif.json', 2987 bytes at offset 0x1078: > 0x00000000 7b222473 6368656d 61223a20 22687474 {"$schema": "htt > 0x00000010 70733a2f 2f646f63 732e6f61 7369732d ps://docs.oasis- > 0x00000020 6f70656e 2e6f7267 2f736172 69662f73 open.org/sarif/s > 0x00000030 61726966 2f76322e 312e302f 65727261 arif/v2.1.0/erra > 0x00000040 74613031 2f6f732f 73636865 6d61732f ta01/os/schemas/ > 0x00000050 73617269 662d7363 68656d61 2d322e31 sarif-schema-2.1 > 0x00000060 2e302e6a 736f6e22 2c0a2022 76657273 .0.json",. "vers > 0x00000070 696f6e22 3a202232 2e312e30 222c0a20 ion": "2.1.0",. > Hex dump of section [6] '.sarif.json5', 2699 bytes at offset 0x5ed: > 0x00000000 7b222473 6368656d 61223a20 22687474 {"$schema": "htt > 0x00000010 70733a2f 2f646f63 732e6f61 7369732d ps://docs.oasis- > 0x00000020 6f70656e 2e6f7267 2f736172 69662f73 open.org/sarif/s > 0x00000030 61726966 2f76322e 312e302f 65727261 arif/v2.1.0/erra > 0x00000040 74613031 2f6f732f 73636865 6d61732f ta01/os/schemas/ > 0x00000050 73617269 662d7363 68656d61 2d322e31 sarif-schema-2.1 > 0x00000060 2e302e6a 736f6e22 2c0a2076 65727369 .0.json",. versi > 0x00000070 6f6e3a20 22322e31 2e30222c 0a207275 on: "2.1.0",. ru > Hex dump of section [5] '.sarif.cbor', 1410 bytes at offset 0x6b: > 0x00000000 a3672473 6368656d 61785a68 74747073 .g$schemaxZhttps > 0x00000010 3a2f2f64 6f63732e 6f617369 732d6f70 ://docs.oasis-op > 0x00000020 656e2e6f 72672f73 61726966 2f736172 en.org/sarif/sar > 0x00000030 69662f76 322e312e 302f6572 72617461 if/v2.1.0/errata > 0x00000040 30312f6f 732f7363 68656d61 732f7361 01/os/schemas/sa > 0x00000050 7269662d 73636865 6d612d32 2e312e30 rif-schema-2.1.0 > 0x00000060 2e6a736f 6e677665 7273696f 6e65322e .jsongversione2. > 0x00000070 312e3064 72756e73 81a56474 6f6f6ca1 1.0druns..dtool. > > Dumping the sections: > > $ objcopy test.o /dev/null --dump-section .sarif.json=/dev/stdout | > head > {"$schema": > "https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/schemas/s > arif-schema-2.1.0.json", > "version": "2.1.0", > "runs": [{"tool": {"driver": {"name": "GNU C23", > "fullName": "GNU C23 (GCC) version > 16.0.0 20250505 (experimental) (x86_64-pc-linux-gnu)", > "version": "16.0.0 20250505 > (experimental)", > "informationUri": > "https://gcc.gnu.org/gcc-16/", > "rules": [{"id": "-Wunused-variable", > "helpUri": > "https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-un > used-variable"}]}}, > "invocations": [{"arguments": ["./cc1", > "-quiet", > > > $ objcopy test.o /dev/null --dump-section .sarif.cbor=/dev/stdout | > cbor2pretty.rb | head > a3 # map(3) > 67 # text(7) > 24736368656d61 # "$schema" > 78 5a # text(90) > > 68747470733a2f2f646f63732e6f617369732d6f70656e2e6f72672f73617269662f7 > 3617269662f76322e312e302f65727261746130312f6f732f736368656d61732f7361 > 7269662d736368656d612d322e312e302e6a736f6e # > "https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/schemas/s > arif-schema-2.1.0.json" > 67 # text(7) > 76657273696f6e # "version" > 65 # text(5) > 322e312e30 # "2.1.0" > 64 # text(4) > > but I think gzipping the json would be simpler and likely more space- > efficient than using CBOR. > > Replaying the diagnostics in test.o using sarif-replay: > > $ objcopy test.o /dev/null --dump-section .sarif.json=tmp.json \ > | LD_LIBRARY_PATH=. ./sarif-replay tmp.json > test.c:2:12: warning: ‘j’ defined but not used [-Wunused-variable] > 2 | static int j; > | ^ > > So presumably we could do something similar with optimization > records. > > > > 2. What the major concerns when we decide whether to dump the > > optimization info to a > > separate file, or embed the optimization info into the object file? > > For my use-case, I was thinking of diagnostics and build metadata, as > a > kind of "annobin on steroids". I don't know what the pros/cons of > embedding vs separate file would be for optimization info.
A more ambitious approach might be to try to encode SARIF as DWARF (via some extension to SWARF), which might lead to the SARIF being able to reference specific binary locations, and for a linker to be able to consolidate repeated strings in the data. Dave > > Dave > > > > > > Thanks a lot. > > > > Qing > > > > > On Jan 29, 2026, at 11:58, David Malcolm <[email protected]> > > > wrote: > > > > > > On Wed, 2026-01-28 at 17:53 -0500, Siddhesh Poyarekar wrote: > > > > On 2026-01-28 10:41, Qing Zhao via Gcc wrote: > > > > > Does GCC provide any option to record optimization > > > > > information, > > > > > such as inlining, loop transformation, > > > > > profiling consistency, etc into specific sections of binary > > > > > code? > > > > > > > > I may be misremembering this, but I think David had some ideas > > > > about > > > > doing something like this in SARIF. > > > > > > > > > > Several thoughts here: > > > > > > (a) I've written a prototype that embeds SARIF as an ELF section > > > in > > > the > > > generated object file, rather like debuginfo (my idea at the time > > > being > > > that a binary could contain within it its build flags and other > > > metadata, and its diagnostics, etc). I don't think I posted it > > > to > > > the > > > mailing list though. > > > > > > (b) A long time ago I prototyped a gcc implementation of llvm's > > > idea of > > > optimization remarks, to send info optimization through the > > > diagnostics > > > subsystem, but IIRC that work ended up as the revamp of optinfo > > > (in > > > GCC > > > 9?; see my Cauldron 2018 talk on optimization records), which > > > generalized some of the internals of how we track optimization > > > info. > > > The machine-readable output is a custom json-based format. > > > > > > (c) SARIF would probably be a good fit for optimization records; > > > it's > > > machine-readable, and has a rich vocabulary for source locations, > > > code > > > constructs, machine locations, etc; IDEs and other tooling > > > understand > > > it, so they'd get a source-level view of optimization info "for > > > free". > > > Note that currently our SARIF output captures the contents of > > > every > > > source file referred to by any diagnostics, but we could e.g. > > > capture > > > every source file/header used during the compile, and could > > > capture > > > e.g. SHA1 sums rather than file content. > > > > > > (d) I've added the ability to add custom info to diagnostic > > > sinks; > > > see > > > e.g. capturing CFG information in > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e20eee3897ae8cd0f2212dad0710d64df8f1a956 > > > > > > (e) I've added a new publish/subscribe framework to GCC for > > > loosely > > > coupled notifications that would probably help with the > > > implementation > > > (to avoid needing to have the diagnostics subsystem "know" too > > > much > > > about the optimizer). > > > > > > So possible GCC 17 material might be: > > > > > > (d) add a new sink to the optinfo subsystem that adds a new > > > pub/sub > > > channel about optimization info, and sends notifications about > > > the > > > optimization records there > > > > > > (e) add a new option to -fdiagnostics-add-output to capture > > > optinfo, > > > which when enabled subscribes the diagnostic sink to the optinfo > > > notifications channel. Or we just skip (d) and work more > > > directly > > > with > > > optinfo, but (d) allows some extra flexibility e.g. for plugins > > > that > > > listen for optimization decisions. > > > > > > (f) potentially add a new option to the SARIF sink to support > > > embedding > > > the data in an ELF section, rather than writing to a file (as per > > > (a) > > > above). > > > > > > Brainstorming, the user might be able to do something like: > > > > > > -fdiagnostics-add-output=sarif:elf- > > > section=optimizations,optinfo=inline > > > > > > or whatnot, and have an ELF section capturing the decisions made > > > by > > > the > > > inliner. > > > > > > Or we could have an option to send optinfo as diagnostics, like > > > LLVM's > > > optimization records (and (b) above), and have the diagnostics > > > sinks > > > handle them that way (text, SARIF, HTML). > > > > > > Dave > > > > > >
