Re: GSoC 2026: Extend the static analysis pass

Ridham Khurana via Gcc Thu, 26 Feb 2026 03:14:30 -0800

Hi Dave,
Thanks for your detailed response

It makes sense to call sequential structure as “action-list” rather than
"action-map"


I have been studying both iteration loops for the last few days, and I
think the lazy approach will be better. Both already follow this pattern
(*check_format_info_main* in *c-format.cc* and *compute_format_length* in
*gimple-ssa-sprintf.cc* ) , that is they are using a pointer to move
forward through the raw string and process each directive one at a time.
(*while (*format_chars != 0)* in *c-format.cc* and *for(const char *pf =
info.fmtstr; ; ++dirno)* in *gimple-ssa-sprintf.cc*).

Using this lazy approach will also naturally preserve the deferred
source-location computation that you mentioned , and at the same time , the
mapping of directive to its exact location (exact source column to be
precise) will be done only if required.

gcc/ with a *gcc::format_string* namespace appears perfect for this
project. Also, I understand to keep the patches as a sequence of patches
instead of one huge patch, starting with refactoring in *c-format.cc* and
then moving forward with patches containing small steps.

My next step will be to start sketching the shared internal iterator API
 on the basis of patterns I am observing in both files.

Best,
Ridham Khurana



On Fri, Feb 20, 2026 at 2:21 AM David Malcolm <[email protected]> wrote:

> On Thu, 2026-02-19 at 16:50 +0530, Ridham Khurana wrote:
> > Hi Dave
>
> Hi Ridham.
>
> Thanks for taking this discussion to the mailing list.
>
> For reference of other people, this is the GSoC proposal relating to PR
> 107017 (adding format string support to -fanalyzer).
>
> > Thanks for the clarifications and the detailed pointers. Here is my
> > understanding of the design architecture and progress so far:
> >
> > I have been going through both files c-format.cc and gimple-ssa-
> > sprintf.cc,
> > and I think I understand the structure/architecture you are
> > suggesting. So,
> > the idea is to introduce a format_parser class (that you mentioned in
> > last
> > email), that takes the raw format string as input and breaks it into
> > different but sequential components, for example literal text
> > segments or
> > directives like %d or %f. I have been calling this sequential
> > structure the
> > "action-map", just to address it with a name. The main point of this
> > shared
> > approach is to avoid having the frontend, GIMPLE and
> > analyzer(upcoming) to
> > run their own logic to decode it, but use this common parser, and
> > instead
> > have them all start from the same parsed action-map.
>
> (nods)
>
> Maybe an "action_list" rather than "action_map", given that it has a
> sequence?  (but that risks a "what color shall we paint the bikeshed"
> discussion)
>
> > The
> > "format_string_iterator" will then provide us an easy and consistent
> > way
> > for different files to walk through the action-map, at the same time
> > make
> > sure the internals of the action-map are not exposed.
>
> (nods)
>
> > From this, every
> > subsystem that requires going through the action-map can use it with
> > its
> > own logic as required, like argument checking in frontend, sprintf
> > folding
> > in GIMPLE, and mainly modelling memory-effects in analyzer, while
> > keeping
> > all the existing semantic algorithms in their own respective files.
>
> A simple way of doing this might be for the shared code to generate a
> std::list<format_action> (or a std::vector), which would be your
> action-map, but another approach might be to eschew an object for the
> action-map, and instead have format_string_iterator provide "actions"
> on demand as clients iterate through the actions, something like:
>
> class format_string_iterator
> {
> public:
>    bool done_p () const;
>    void next ();
>    format_string_action get_next_action () const;
>
> private:
>    /* implementation-specific details here */
> };
>
> or somesuch.
>
> I'm not sure which approach is better.
>
> IIRC, the current implementations in both c-format.cc and gimple-ssa-
> sprintf.cc are structured around iterations, rather than reusing an
> existing container and populating/iterating over that.
>
> >
> > One more thing that I was thinking is, where should the shared
> > format_parser and format_string_iterator classes live in the tree?
> > Right
> > now, c-format.cc is under gcc/c-family/, gimple-ssa-sprintf.cc is
> > directly
> > under gcc, and the analyzer is under gcc/analyzer/. Since the shared
> > code
> > needs to be accessible for all three, I am guessing it should go
> > directly
> > under gcc, but I wanted to confirm if you have some other approach to
> > deal
> > with this before I start thinking about the layout.
>
> I think directly under the gcc/ directory, as part of gcc/Makefile.in's
> OBJs.
>
> Maybe we should introduce a gcc::format_string namespace for the shared
> code to live in???
>
> One thing to remember is that patches need to be reviewable - it's good
> to avoid a huge patch that deletes a big chunk of code and replaces it
> with a different other chunk of code.  So it might be best to build
> this up as a series of patches, expressing refactorings in c-format.cc
> that eventually move the new classes elsewhere (given that IIRC c-
> format.cc currently got the most complicated logic).  But maybe we
> don't need to worry about that during prototyping/experimentation.
>
> A lot of the complexity in c-format.cc is that if we emit a warning, we
> have to do some work to generate source-location information for the
> parts of the format string of interest, but we don't want to do that
> work unless it's going to be needed (e.g. finding the exact location of
> a "%i").  So we'll want the action-map to keep that deferral of work,
> so that we only do it if we issue a diagnostic relating to part of the
> format string.
>
> > On the region_model vs sm_state_map distinction, I think I follow
> > what you
> > mean. I am looking at how both appear in the .dot dumps (e.g. the
> > rmodel:
> > part showing memory state, and the malloc: part showing checker
> > states like
> > unchecked → freed). I am still going through the callbacks to
> > understand
> > how the two sides connect, but I will keep this in mind as I continue
> > studying more about the analyzer internals.
>
> (nods)
>
> >
> > My next step is to continue my deep dive into both implementations
> > and
> > start planning out how the shared parser API could look, depending on
> > where
> > it will live.
>
> Sounds great.
>
> Let me know if you have any questions.
>
> Thanks
> Dave
>
>

Re: GSoC 2026: Extend the static analysis pass

Reply via email to