Re: GSoC 2026: Extend the static analysis pass

David Malcolm via Gcc Fri, 13 Mar 2026 16:55:02 -0700

On Fri, 2026-03-13 at 12:23 +0530, Ridham Khurana wrote:
> Hi Dave,
> 
> Thanks for the confirmation about the expected type argument, I will
> add it
> in the shared layer.
> 
> While going through the current analyzer implementation, I noticed
> that
> arguments to the function calls are retrieved through
> *call_details::get_arg_svalue()* and then handles as const svalue*,
> rather
> than *tree* nodes like in the frontend and GIMPLE passes. From what I
> can
> understand, the library calls behaviour is modelled through
> *known_function* handlers interacting with the *region_model*(for
> example
> through *impl_call_pre in kf_ handlers*), and then the existing
> checks for
> functions like printf are mostly driven by format attribute and the
> validation of format string arguments(for example using
> *check_for_null_terminated_string_arg()*), instead of interpreting
> the
> individual directives.


That's correct.

> 
> But one thing that I am not sure about is where the shared string-
> parser
> show be integrated on the analyzer side. Maybe it should be triggered
> through the attribute-based path , or it is better to use it inside
> the
> individual kf_* handlers for the functions like printf-style.

I'm not sure.  I think we want a subroutine inside the analyzer that
can be called from either place, and then see how well each approach
works.

On the subject of known_function handlers, some other GSoC candidates
have had success in making patches that add new known_function
subclasses for specific POSIX/C stdlib entrypoints.  This is a
relatively easy and self-contained way to improve -fanalyzer, and it's
a good way to demonstrate technical prowess, and to shake out any
problems that a candidate might run into building/debugging gcc on
their hardware.  It overlaps with the format-string support, so would
be a useful learning experience - but you'd have to choose a simpler
API entrypoint (obviously we don't have the format-string parsing in
convenient modular form yet).

> 
> Also, before starting to draft the official proposal, I wanted to
> confirm
> the expected size of this project. From my current understanding, it
> would
> be 350 hours, 

I think 350 hours is the better choice; this is a rather ambitious
project.


> dividing this project into 2 major phases, the first phase of
> the project to unify the parsing logic among all 3 subsystems 

(it would be the *2* subsystems at this time, since the analyzer
doesn't yet support format strings)

> and the
> second phase to be the actual work on the analyzer part. Please let
> me know
> if it matches your expectations or would you prefer 175 hour scope?

FWIW I'm always a bit sceptical of timetables that rigidly divide
projects into phases - it feels too much like the "waterfall" model of
development.  But yes, splitting out the parsing logic from the other 2
subsystems is a prerequisite before using it in -fanalyzer (I suppose
you could have a proof-of-concept that recognizes hardcoded strings and
provides the analyzer with the (hardcoded) action list, but that's
probably wasted effort compared to simply doing the refactoring work).

A useful exercise would be to get familiar with running gcc's full test
suite, and verifying that a patch doesn't regress anything, since
that's very important during the refactoring of the existing code.

Hope this is helpful and makes sense; let me know if you have any
questions
Dave

Re: GSoC 2026: Extend the static analysis pass

Reply via email to