Hi Dave, Thanks for your detailed response It makes sense to call sequential structure as “action-list” rather than "action-map"
I have been studying both iteration loops for the last few days, and I think the lazy approach will be better. Both already follow this pattern (*check_format_info_main* in *c-format.cc* and *compute_format_length* in *gimple-ssa-sprintf.cc* ) , that is they are using a pointer to move forward through the raw string and process each directive one at a time. (*while (*format_chars != 0)* in *c-format.cc* and *for(const char *pf = info.fmtstr; ; ++dirno)* in *gimple-ssa-sprintf.cc*). Using this lazy approach will also naturally preserve the deferred source-location computation that you mentioned , and at the same time , the mapping of directive to its exact location (exact source column to be precise) will be done only if required. gcc/ with a *gcc::format_string* namespace appears perfect for this project. Also, I understand to keep the patches as a sequence of patches instead of one huge patch, starting with refactoring in *c-format.cc* and then moving forward with patches containing small steps. My next step will be to start sketching the shared internal iterator API on the basis of patterns I am observing in both files. Best, Ridham Khurana On Fri, Feb 20, 2026 at 2:21 AM David Malcolm <[email protected]> wrote: > On Thu, 2026-02-19 at 16:50 +0530, Ridham Khurana wrote: > > Hi Dave > > Hi Ridham. > > Thanks for taking this discussion to the mailing list. > > For reference of other people, this is the GSoC proposal relating to PR > 107017 (adding format string support to -fanalyzer). > > > Thanks for the clarifications and the detailed pointers. Here is my > > understanding of the design architecture and progress so far: > > > > I have been going through both files c-format.cc and gimple-ssa- > > sprintf.cc, > > and I think I understand the structure/architecture you are > > suggesting. So, > > the idea is to introduce a format_parser class (that you mentioned in > > last > > email), that takes the raw format string as input and breaks it into > > different but sequential components, for example literal text > > segments or > > directives like %d or %f. I have been calling this sequential > > structure the > > "action-map", just to address it with a name. The main point of this > > shared > > approach is to avoid having the frontend, GIMPLE and > > analyzer(upcoming) to > > run their own logic to decode it, but use this common parser, and > > instead > > have them all start from the same parsed action-map. > > (nods) > > Maybe an "action_list" rather than "action_map", given that it has a > sequence? (but that risks a "what color shall we paint the bikeshed" > discussion) > > > The > > "format_string_iterator" will then provide us an easy and consistent > > way > > for different files to walk through the action-map, at the same time > > make > > sure the internals of the action-map are not exposed. > > (nods) > > > From this, every > > subsystem that requires going through the action-map can use it with > > its > > own logic as required, like argument checking in frontend, sprintf > > folding > > in GIMPLE, and mainly modelling memory-effects in analyzer, while > > keeping > > all the existing semantic algorithms in their own respective files. > > A simple way of doing this might be for the shared code to generate a > std::list<format_action> (or a std::vector), which would be your > action-map, but another approach might be to eschew an object for the > action-map, and instead have format_string_iterator provide "actions" > on demand as clients iterate through the actions, something like: > > class format_string_iterator > { > public: > bool done_p () const; > void next (); > format_string_action get_next_action () const; > > private: > /* implementation-specific details here */ > }; > > or somesuch. > > I'm not sure which approach is better. > > IIRC, the current implementations in both c-format.cc and gimple-ssa- > sprintf.cc are structured around iterations, rather than reusing an > existing container and populating/iterating over that. > > > > > One more thing that I was thinking is, where should the shared > > format_parser and format_string_iterator classes live in the tree? > > Right > > now, c-format.cc is under gcc/c-family/, gimple-ssa-sprintf.cc is > > directly > > under gcc, and the analyzer is under gcc/analyzer/. Since the shared > > code > > needs to be accessible for all three, I am guessing it should go > > directly > > under gcc, but I wanted to confirm if you have some other approach to > > deal > > with this before I start thinking about the layout. > > I think directly under the gcc/ directory, as part of gcc/Makefile.in's > OBJs. > > Maybe we should introduce a gcc::format_string namespace for the shared > code to live in??? > > One thing to remember is that patches need to be reviewable - it's good > to avoid a huge patch that deletes a big chunk of code and replaces it > with a different other chunk of code. So it might be best to build > this up as a series of patches, expressing refactorings in c-format.cc > that eventually move the new classes elsewhere (given that IIRC c- > format.cc currently got the most complicated logic). But maybe we > don't need to worry about that during prototyping/experimentation. > > A lot of the complexity in c-format.cc is that if we emit a warning, we > have to do some work to generate source-location information for the > parts of the format string of interest, but we don't want to do that > work unless it's going to be needed (e.g. finding the exact location of > a "%i"). So we'll want the action-map to keep that deferral of work, > so that we only do it if we issue a diagnostic relating to part of the > format string. > > > On the region_model vs sm_state_map distinction, I think I follow > > what you > > mean. I am looking at how both appear in the .dot dumps (e.g. the > > rmodel: > > part showing memory state, and the malloc: part showing checker > > states like > > unchecked → freed). I am still going through the callbacks to > > understand > > how the two sides connect, but I will keep this in mind as I continue > > studying more about the analyzer internals. > > (nods) > > > > > My next step is to continue my deep dive into both implementations > > and > > start planning out how the shared parser API could look, depending on > > where > > it will live. > > Sounds great. > > Let me know if you have any questions. > > Thanks > Dave > >
