On Mon, 2026-03-16 at 10:58 +0500, Islombek Ismoilov wrote: > Hi Dave > I’ve spent more time digging into the analyzer and > c-family/c-format internals. I want to share my findings on the > specific > problems I’ve identified and my proposed approach to solving them for > Option A. > > Currently, we have two disparate worlds: > > 1. > > Frontend-based (с-format.cc): Excellent at parsing complex ISO > C/POSIX > format strings but tied to the frontend tree structures and > location_t. > 2. > > Middle-end based (gimple-ssa-sprintf.cc): Good at range-based > overflow > estimation during optimization, but lacking the path-sensitive > depth of the > analyzer. > > My Proposed Approach: > > 1. Unified Format Parser (The "Middle-end Library") My primary goal > is to > extract the core parsing logic from с-format.cc into a new, > frontend-independent component (e.g, gcc/format-parser.cc).
Sounds reasonable, but why "Middle-end Library"? - presumably this code would be used by both c-format.cc, gimple-ssa-sprintf.cc, and by the analyzer. > > - > > I plan to define a shared format-_string_spec structure that > represents > the "intent" of a format specifier (type, width, precision, flags) > without > relying on frontend-specific data. You might want to also associate a direction with format string specs, to handle things like *scanf which write back through pointer arguments, rather than reading through them. > - > > This will allow -fanalyzer to invoke the parser on GIMPLE strings > and > receive a structured representation of what to expect. Sounds good. Ideally we should provide source location information when we complain about a format string operation. There's some awkward logic around getting at location_t values within a string literal; see the substring location code. We'd probably want to capture the range of chars within the string for each format-string-spec entry, and get a location_t for that on demand when issuing diagnostics. > > 2. Path-Sensitive Range Analysis Integrating this with > -wanalyzer-out-of-bounds is where the real value lies. > > - > > By leveraging the region_model , I want to map the svalue of > arguments > to the constraints defined by the format string. > - > > For example, if the analyzer knows a variable n is in the range > [1000, > 9999] and it's being printed into a 4-byte buffer via sprintf , we > can > emit a precise path-sensitive overflow warning that the current > middle-end > might miss. (nods) > > 3. Implementation Strategy > > - > > Phase 1: Identify and isolate the "state-machine" part of the > existing > format parser. Would this essentially be a refactoring of c-format.cc to extract an iterator class? You'll want to get familiar with running the regression tests, particular for -Wformat, to make sure that the refactorings don't change behavior. > - > > Phase 2: Implement a format_string_checker class within the > analyzer > that uses this isolated parser. > - > > Phase 3: Create a bridge between the parser's requirements and the > analyzer’s range_query and region_model . I'm not quite sure about phase 3 - what is a range_query here? Hope this is constructive Dave > вт, 10 мар. 2026 г. в 05:54, David Malcolm <[email protected]>: > > > On Mon, 2026-03-09 at 19:33 +0500, Islombek Ismoilov wrote: > > > Hi Dave, > > > > > > Thank you for your message. > > > > > > Yes, I am able to make changes to GCC, rebuild it, and step > > > through > > > the > > > modified code using a debugger. I have already tested this > > > workflow > > > and > > > confirmed that everything works as expected. > > > > Good. > > > > > > > > Could you also briefly describe the project and how it will work > > > technically? > > > > Have a look at the relevant parts of the SummerOfCode wiki page, > > and > > have a look at > > https://gcc.gnu.org/wiki/StaticAnalyzer > > > > Dave > > > > > > > > Best regards, > > > Islom > > > > > > пн, 9 мар. 2026 г., 18:33 David Malcolm <[email protected]>: > > > > > > > On Sun, 2026-03-08 at 15:17 +0500, Islombek Ismoilov wrote: > > > > > Hi Dave > > > > > Thanks for the advise. I've the fixed the issue by performing > > > > > clean > > > > > build. > > > > > I removed the old GCC source directory entirely, and re- > > > > > downloaded > > > > > the > > > > > source, and reapplied my changes. It is working correctly > > > > > now. > > > > > > > > Excellent. > > > > > > > > Are you able to make changes to gcc, rebuild it, and step > > > > through > > > > the > > > > changed code in a debugger? That's a good prerequisite that we > > > > want to > > > > get applicants to achieve. > > > > > > > > Dave > > > > > > > > > > > > > > > > > > best regards, > > > > > Islom > > > > > > > > > > вс, 8 мар. 2026 г. в 05:32, David Malcolm > > > > > <[email protected]>: > > > > > > > > > > > On Thu, 2026-03-05 at 00:10 +0100, Martin Jambor wrote: > > > > > > > Hello Islombek, > > > > > > > > > > > > > > On Tue, Mar 03 2026, Islombek Ismoilov via Gcc wrote: > > > > > > > > Dear David Malcolm > > > > > > > > > > > > > > > > I would like to share my progress on building and > > > > > > > > modifying > > > > > > > > the > > > > > > > > GNU > > > > > > > > compiler from source. > > > > > > > > > > > > > > > > I successfully built GCC from the source code. During > > > > > > > > the > > > > > > > > process, > > > > > > > > I > > > > > > > > resolved dependency and configuration issues that > > > > > > > > arose. > > > > > > > > > > > > > > > > After the build was completed, I tested the compiled > > > > > > > > compiler > > > > > > > > using > > > > > > > > a > > > > > > > > simple test.c file. > > > > > > > > > > > > > > > > int main(){ > > > > > > > > > > > > > > > > return 0; > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > The compilation and execution worked correctly, > > > > > > > > confirming > > > > > > > > that > > > > > > > > the > > > > > > > > build > > > > > > > > was functioning as expected. > > > > > > > > > > > > > > > > Then I started experimenting with modifications in the > > > > > > > > source > > > > > > > > code. > > > > > > > > I > > > > > > > > edited the file c-parser.cc , specifically the function > > > > > > > > "c_parser_translation_unit" and added the following > > > > > > > > line: > > > > > > > > > > > > > > > > warning (0, "Good Job"); > > > > > > > > > > > > > > > > My goal was to introduce a warning that would appear > > > > > > > > during > > > > > > > > each > > > > > > > > compilation. > > > > > > > > > > > > > > When I want to check that a code gets executed in the > > > > > > > most > > > > > > > simple > > > > > > > way, I > > > > > > > just resort to fprintf. The trick is to direct the > > > > > > > output to > > > > > > > stderr. > > > > > > > Putting > > > > > > > > > > > > > > fprintf (stderr, "Good job!\n"); > > > > > > > > > > > > > > at the beginning of c_parser_translation_unit does what > > > > > > > you'd > > > > > > > expect > > > > > > > it > > > > > > > to do. > > > > > > > > > > > > > > > > > > > > > > > However, after making the changes and rebuilding, the > > > > > > > > cc1 > > > > > > > > binary > > > > > > > > was not > > > > > > > > generated. The build process completes the > > > > > > > > configuration > > > > > > > > stage > > > > > > > > but > > > > > > > > fails to > > > > > > > > produce the main compiler binary. I restored c- > > > > > > > > parser.cc to > > > > > > > > its > > > > > > > > original > > > > > > > > state, yet the issue still persists , the build still > > > > > > > > finishes > > > > > > > > without > > > > > > > > generating cc1. > > > > > > > > > > > > > > This is of course strange. What were the commands you > > > > > > > issued > > > > > > > (in > > > > > > > which > > > > > > > directories) and what were the error messages? There > > > > > > > should > > > > > > > be > > > > > > > no > > > > > > > need > > > > > > > to re-run configuration after such small change. Did > > > > > > > make > > > > > > > exit > > > > > > > with > > > > > > > exit code zero? > > > > > > > > > > > > > > Did you disable bootstrap during the first configuration > > > > > > > step? > > > > > > > > > > > > This is very important when preparing to make changes to > > > > > > GCC, > > > > > > otherwise > > > > > > making edits is very tedious. Islombek, did you check > > > > > > this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > what do you advise? > > > > > > > > > > > > > > I'm afraid we need more details, after you restore the > > > > > > > file, > > > > > > > all > > > > > > > should > > > > > > > be as before, of course. > > > > > > > > > > > > > > Good luck debugging this and with GSoC in general. > > > > > > > > > > > > Islombek: did you get any further with this, or are you > > > > > > stuck? > > > > > > > > > > > > Hope this is constructive > > > > > > Dave > > > > > > > > > > > > > > > > > > > > > > > >
