On Wed, 2026-03-04 at 15:42 +0530, Ridham Khurana wrote:
> Hi Dave,
> 
> As I was going through both files, I noticed that in the file
> *gimple-ssa-sprintf.cc*, directive struct bundles a tree arg directly
> and
> in the file *c-format.cc*, argument parser takes tree
> first_fillin_param.
> But in analyzer part, arguments are const svalue*(via
> *call_details::get_arg_svalue()*) and not just tree.

Yes; the analyzer has its own "symbolic value" class, which, for better
or worse, isn't "tree".

>  So, this means the
> shared layer should only handle format string parsing and leave the
> argument binding to each corresponding subsystem respectively, which
> I
> think was already implying by your iterator sketch.
> 
> I made some rough sketch of the API:
> 
> struct format_string_action {
>    enum kind { LITERAL, DIRECTIVE } kind; // specifying LITERAL or
> DIRECTIVE
>    offset, length; // specifying exact position in the raw format
> string
> 
>    // For directives:
>    specifier;      // 'd', 's', 'f', etc
>    length_mod;     // 'l', 'h', 'L', etc
>    flags;          // '-', '+', ' ', '#', '0'
>    width;          // value, or can be -1 if unspecified, or '*'
>    precision;      // value, or can be -1 if unspecified, or '*'
> };
> 
> class format_string_iterator{
> public:
>    bool done_p() const; // check if completely parsed
>    void next(); // advance one
>    format_string_action get_next_action() const; // just lookup for
> the
> next , not advancing to the next
>    format_string_action get_current() const; // getting the current
> action
> from action-list
> 
> private:
>    const char* raw_string; // original format string is stored here
>    const char* pos; // current position in raw_string
>    format_string_action latest; // latest parsed in action-list
> };

Looks promising.

> 
> Also, one question about this format_string_action, should it have
> some
> variable containing the number of arguments that are consumed by the
> directive? Example, %*.*d consumes 3 (width,precision,value) while %d
> consumes only 1. So, should this be stored in the shared layer or the
> subsystems should handle it in their own files, when seeing width='*'
> and
> precision='*'?

Perhaps the shared layer should provide a list of expected types to
consume from the varargs, so that e.g. "%*.*d" consumes [int, int,
int], whereas %d just consumes [int].

Dave

> 
> Best,
> Ridham Khurana
> 
> On Fri, Feb 20, 2026 at 2:21 AM David Malcolm <[email protected]>
> wrote:
> 
> > On Thu, 2026-02-19 at 16:50 +0530, Ridham Khurana wrote:
> > > Hi Dave
> > 
> > Hi Ridham.
> > 
> > Thanks for taking this discussion to the mailing list.
> > 
> > For reference of other people, this is the GSoC proposal relating
> > to PR
> > 107017 (adding format string support to -fanalyzer).
> > 
> > > Thanks for the clarifications and the detailed pointers. Here is
> > > my
> > > understanding of the design architecture and progress so far:
> > > 
> > > I have been going through both files c-format.cc and gimple-ssa-
> > > sprintf.cc,
> > > and I think I understand the structure/architecture you are
> > > suggesting. So,
> > > the idea is to introduce a format_parser class (that you
> > > mentioned in
> > > last
> > > email), that takes the raw format string as input and breaks it
> > > into
> > > different but sequential components, for example literal text
> > > segments or
> > > directives like %d or %f. I have been calling this sequential
> > > structure the
> > > "action-map", just to address it with a name. The main point of
> > > this
> > > shared
> > > approach is to avoid having the frontend, GIMPLE and
> > > analyzer(upcoming) to
> > > run their own logic to decode it, but use this common parser, and
> > > instead
> > > have them all start from the same parsed action-map.
> > 
> > (nods)
> > 
> > Maybe an "action_list" rather than "action_map", given that it has
> > a
> > sequence?  (but that risks a "what color shall we paint the
> > bikeshed"
> > discussion)
> > 
> > > The
> > > "format_string_iterator" will then provide us an easy and
> > > consistent
> > > way
> > > for different files to walk through the action-map, at the same
> > > time
> > > make
> > > sure the internals of the action-map are not exposed.
> > 
> > (nods)
> > 
> > > From this, every
> > > subsystem that requires going through the action-map can use it
> > > with
> > > its
> > > own logic as required, like argument checking in frontend,
> > > sprintf
> > > folding
> > > in GIMPLE, and mainly modelling memory-effects in analyzer, while
> > > keeping
> > > all the existing semantic algorithms in their own respective
> > > files.
> > 
> > A simple way of doing this might be for the shared code to generate
> > a
> > std::list<format_action> (or a std::vector), which would be your
> > action-map, but another approach might be to eschew an object for
> > the
> > action-map, and instead have format_string_iterator provide
> > "actions"
> > on demand as clients iterate through the actions, something like:
> > 
> > class format_string_iterator
> > {
> > public:
> >    bool done_p () const;
> >    void next ();
> >    format_string_action get_next_action () const;
> > 
> > private:
> >    /* implementation-specific details here */
> > };
> > 
> > or somesuch.
> > 
> > I'm not sure which approach is better.
> > 
> > IIRC, the current implementations in both c-format.cc and gimple-
> > ssa-
> > sprintf.cc are structured around iterations, rather than reusing an
> > existing container and populating/iterating over that.
> > 
> > > 
> > > One more thing that I was thinking is, where should the shared
> > > format_parser and format_string_iterator classes live in the
> > > tree?
> > > Right
> > > now, c-format.cc is under gcc/c-family/, gimple-ssa-sprintf.cc is
> > > directly
> > > under gcc, and the analyzer is under gcc/analyzer/. Since the
> > > shared
> > > code
> > > needs to be accessible for all three, I am guessing it should go
> > > directly
> > > under gcc, but I wanted to confirm if you have some other
> > > approach to
> > > deal
> > > with this before I start thinking about the layout.
> > 
> > I think directly under the gcc/ directory, as part of
> > gcc/Makefile.in's
> > OBJs.
> > 
> > Maybe we should introduce a gcc::format_string namespace for the
> > shared
> > code to live in???
> > 
> > One thing to remember is that patches need to be reviewable - it's
> > good
> > to avoid a huge patch that deletes a big chunk of code and replaces
> > it
> > with a different other chunk of code.  So it might be best to build
> > this up as a series of patches, expressing refactorings in c-
> > format.cc
> > that eventually move the new classes elsewhere (given that IIRC c-
> > format.cc currently got the most complicated logic).  But maybe we
> > don't need to worry about that during prototyping/experimentation.
> > 
> > A lot of the complexity in c-format.cc is that if we emit a
> > warning, we
> > have to do some work to generate source-location information for
> > the
> > parts of the format string of interest, but we don't want to do
> > that
> > work unless it's going to be needed (e.g. finding the exact
> > location of
> > a "%i").  So we'll want the action-map to keep that deferral of
> > work,
> > so that we only do it if we issue a diagnostic relating to part of
> > the
> > format string.
> > 
> > > On the region_model vs sm_state_map distinction, I think I follow
> > > what you
> > > mean. I am looking at how both appear in the .dot dumps (e.g. the
> > > rmodel:
> > > part showing memory state, and the malloc: part showing checker
> > > states like
> > > unchecked → freed). I am still going through the callbacks to
> > > understand
> > > how the two sides connect, but I will keep this in mind as I
> > > continue
> > > studying more about the analyzer internals.
> > 
> > (nods)
> > 
> > > 
> > > My next step is to continue my deep dive into both
> > > implementations
> > > and
> > > start planning out how the shared parser API could look,
> > > depending on
> > > where
> > > it will live.
> > 
> > Sounds great.
> > 
> > Let me know if you have any questions.
> > 
> > Thanks
> > Dave
> > 
> > 

Reply via email to