On Fri, 2026-01-30 at 16:27 +0000, Qing Zhao wrote: > Hi, David, > > Thanks a lot for your information. They are very interesting and > promising.
Thanks. > > I do have two questions: > > 1. When you wrote the prototype that embeds SARIF as an ELF section, > did you collect > any data on the code size increase of the final object files? I didn't collect realistic data. FWIW I've uploaded the patch I had to https://dmalcolm.fedorapeople.org/gcc/2026-01-30/0001-Initial-proof-of-concept-of-writing-sarif-to-asm-plu.patch but it's heavily bit-rotted against trunk. By way of example, in the same directory is a test.s and a test.o generated using the patch on a trivial C file: $ cat test.c int i; static int j; $ ./cc1 -quiet test.c \ -fdiagnostics-add-output=sarif:section=.sarif.json,serialization=json \ -fdiagnostics-add-output=sarif:section=.sarif.json5,serialization=json5 \ -fdiagnostics-add-output=sarif:section=.sarif.cbor,serialization=cbor \ -o test.s \ -Wall test.c:2:12: warning: ‘j’ defined but not used [-Wunused-variable] 2 | static int j; | ^ $ as test.s -o test.o $ for s in json json5 cbor ; do eu-readelf test.o -x .sarif.$s | head ; done Hex dump of section [7] '.sarif.json', 2987 bytes at offset 0x1078: 0x00000000 7b222473 6368656d 61223a20 22687474 {"$schema": "htt 0x00000010 70733a2f 2f646f63 732e6f61 7369732d ps://docs.oasis- 0x00000020 6f70656e 2e6f7267 2f736172 69662f73 open.org/sarif/s 0x00000030 61726966 2f76322e 312e302f 65727261 arif/v2.1.0/erra 0x00000040 74613031 2f6f732f 73636865 6d61732f ta01/os/schemas/ 0x00000050 73617269 662d7363 68656d61 2d322e31 sarif-schema-2.1 0x00000060 2e302e6a 736f6e22 2c0a2022 76657273 .0.json",. "vers 0x00000070 696f6e22 3a202232 2e312e30 222c0a20 ion": "2.1.0",. Hex dump of section [6] '.sarif.json5', 2699 bytes at offset 0x5ed: 0x00000000 7b222473 6368656d 61223a20 22687474 {"$schema": "htt 0x00000010 70733a2f 2f646f63 732e6f61 7369732d ps://docs.oasis- 0x00000020 6f70656e 2e6f7267 2f736172 69662f73 open.org/sarif/s 0x00000030 61726966 2f76322e 312e302f 65727261 arif/v2.1.0/erra 0x00000040 74613031 2f6f732f 73636865 6d61732f ta01/os/schemas/ 0x00000050 73617269 662d7363 68656d61 2d322e31 sarif-schema-2.1 0x00000060 2e302e6a 736f6e22 2c0a2076 65727369 .0.json",. versi 0x00000070 6f6e3a20 22322e31 2e30222c 0a207275 on: "2.1.0",. ru Hex dump of section [5] '.sarif.cbor', 1410 bytes at offset 0x6b: 0x00000000 a3672473 6368656d 61785a68 74747073 .g$schemaxZhttps 0x00000010 3a2f2f64 6f63732e 6f617369 732d6f70 ://docs.oasis-op 0x00000020 656e2e6f 72672f73 61726966 2f736172 en.org/sarif/sar 0x00000030 69662f76 322e312e 302f6572 72617461 if/v2.1.0/errata 0x00000040 30312f6f 732f7363 68656d61 732f7361 01/os/schemas/sa 0x00000050 7269662d 73636865 6d612d32 2e312e30 rif-schema-2.1.0 0x00000060 2e6a736f 6e677665 7273696f 6e65322e .jsongversione2. 0x00000070 312e3064 72756e73 81a56474 6f6f6ca1 1.0druns..dtool. Dumping the sections: $ objcopy test.o /dev/null --dump-section .sarif.json=/dev/stdout | head {"$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/schemas/sarif-schema-2.1.0.json", "version": "2.1.0", "runs": [{"tool": {"driver": {"name": "GNU C23", "fullName": "GNU C23 (GCC) version 16.0.0 20250505 (experimental) (x86_64-pc-linux-gnu)", "version": "16.0.0 20250505 (experimental)", "informationUri": "https://gcc.gnu.org/gcc-16/", "rules": [{"id": "-Wunused-variable", "helpUri": "https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unused-variable"}]}}, "invocations": [{"arguments": ["./cc1", "-quiet", $ objcopy test.o /dev/null --dump-section .sarif.cbor=/dev/stdout | cbor2pretty.rb | head a3 # map(3) 67 # text(7) 24736368656d61 # "$schema" 78 5a # text(90) 68747470733a2f2f646f63732e6f617369732d6f70656e2e6f72672f73617269662f73617269662f76322e312e302f65727261746130312f6f732f736368656d61732f73617269662d736368656d612d322e312e302e6a736f6e # "https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/schemas/sarif-schema-2.1.0.json" 67 # text(7) 76657273696f6e # "version" 65 # text(5) 322e312e30 # "2.1.0" 64 # text(4) but I think gzipping the json would be simpler and likely more space- efficient than using CBOR. Replaying the diagnostics in test.o using sarif-replay: $ objcopy test.o /dev/null --dump-section .sarif.json=tmp.json \ | LD_LIBRARY_PATH=. ./sarif-replay tmp.json test.c:2:12: warning: ‘j’ defined but not used [-Wunused-variable] 2 | static int j; | ^ So presumably we could do something similar with optimization records. > 2. What the major concerns when we decide whether to dump the > optimization info to a > separate file, or embed the optimization info into the object file? For my use-case, I was thinking of diagnostics and build metadata, as a kind of "annobin on steroids". I don't know what the pros/cons of embedding vs separate file would be for optimization info. Dave > > Thanks a lot. > > Qing > > > On Jan 29, 2026, at 11:58, David Malcolm <[email protected]> > > wrote: > > > > On Wed, 2026-01-28 at 17:53 -0500, Siddhesh Poyarekar wrote: > > > On 2026-01-28 10:41, Qing Zhao via Gcc wrote: > > > > Does GCC provide any option to record optimization information, > > > > such as inlining, loop transformation, > > > > profiling consistency, etc into specific sections of binary > > > > code? > > > > > > I may be misremembering this, but I think David had some ideas > > > about > > > doing something like this in SARIF. > > > > > > > Several thoughts here: > > > > (a) I've written a prototype that embeds SARIF as an ELF section in > > the > > generated object file, rather like debuginfo (my idea at the time > > being > > that a binary could contain within it its build flags and other > > metadata, and its diagnostics, etc). I don't think I posted it to > > the > > mailing list though. > > > > (b) A long time ago I prototyped a gcc implementation of llvm's > > idea of > > optimization remarks, to send info optimization through the > > diagnostics > > subsystem, but IIRC that work ended up as the revamp of optinfo (in > > GCC > > 9?; see my Cauldron 2018 talk on optimization records), which > > generalized some of the internals of how we track optimization > > info. > > The machine-readable output is a custom json-based format. > > > > (c) SARIF would probably be a good fit for optimization records; > > it's > > machine-readable, and has a rich vocabulary for source locations, > > code > > constructs, machine locations, etc; IDEs and other tooling > > understand > > it, so they'd get a source-level view of optimization info "for > > free". > > Note that currently our SARIF output captures the contents of every > > source file referred to by any diagnostics, but we could e.g. > > capture > > every source file/header used during the compile, and could capture > > e.g. SHA1 sums rather than file content. > > > > (d) I've added the ability to add custom info to diagnostic sinks; > > see > > e.g. capturing CFG information in > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e20eee3897ae8cd0f2212dad0710d64df8f1a956 > > > > (e) I've added a new publish/subscribe framework to GCC for loosely > > coupled notifications that would probably help with the > > implementation > > (to avoid needing to have the diagnostics subsystem "know" too much > > about the optimizer). > > > > So possible GCC 17 material might be: > > > > (d) add a new sink to the optinfo subsystem that adds a new pub/sub > > channel about optimization info, and sends notifications about the > > optimization records there > > > > (e) add a new option to -fdiagnostics-add-output to capture > > optinfo, > > which when enabled subscribes the diagnostic sink to the optinfo > > notifications channel. Or we just skip (d) and work more directly > > with > > optinfo, but (d) allows some extra flexibility e.g. for plugins > > that > > listen for optimization decisions. > > > > (f) potentially add a new option to the SARIF sink to support > > embedding > > the data in an ELF section, rather than writing to a file (as per > > (a) > > above). > > > > Brainstorming, the user might be able to do something like: > > > > -fdiagnostics-add-output=sarif:elf- > > section=optimizations,optinfo=inline > > > > or whatnot, and have an ELF section capturing the decisions made by > > the > > inliner. > > > > Or we could have an option to send optinfo as diagnostics, like > > LLVM's > > optimization records (and (b) above), and have the diagnostics > > sinks > > handle them that way (text, SARIF, HTML). > > > > Dave > > >
