Re: Draft GSoC 2026 Proposal: CPython API Checking, PR 107646

David Malcolm via Gcc Tue, 17 Mar 2026 14:23:35 -0700

On Mon, 2026-03-16 at 07:17 +0530, Saksham Gupta wrote:
> Hi David,
> 
> Thanks for the incredibly detailed review! Challenge absolutely
> accepted. :)
> 
> You were totally right about the state_machine approach. Given how
> the
> Py_INCREF/DECREF macros expand, tracking pointers directly in the
> region_model and validating against ob_refcnt during stack frame pops
> makes
> way more sense. I completely rewrote Phase 2 to reflect this.
> 
> I also folded all of your other notes into the attached V2 draft:
> 
>    - Added the context on Eric's 2023 work and clarified the scope
> around
>    legacy C extensions and Python 3.14.
>    - Dropped the brittle ascii-art tests in favor of strictly using
>    dg-warning and dg-message.
>    - Moved the .editorconfig setup into the community bonding period.
>    - Updated the timeline to focus on categorizing the API early and
> using
>    helper attributes instead of trying to hardcode everything.
>    - Added the real-world integration testing (psycopg2, etc.) and
> the
>    stretch goals from the wiki.
> 
> I've attached the updated PDF. Let me know if this new region_model
> architecture aligns better with what you have in mind!


Thanks; this is a big improvement.

Note that the proposal of 350 hours over 12 weeks is almost 30 hours a
week, which is a big chunk of time.  Is that going to be achievable?
e.g. do you have any exams or other things scheduled over the summer?

Dave

> 
> Best regards,
> Saksham
> 
> 
> On Mon, 16 Mar 2026 at 06:24, David Malcolm <[email protected]>
> wrote:
> 
> > On Sun, 2026-03-15 at 12:23 +0530, Saksham Gupta wrote:
> > > Hi David,
> > > 
> > > I’ve attached the draft of my GSoC proposal for the CPython API
> > > checker. I
> > > haven't submitted it to the official portal yet—I wanted to run
> > > it by
> > > you
> > > first to catch any mistakes and make sure the technical direction
> > > actually
> > > makes sense.
> > > 
> > > I made sure to include your recent advice. The scope now
> > > explicitly
> > > targets
> > > Python 3.11+ to handle the PEP 683 changes. My Compile Farm
> > > account
> > > (am-saksham) is also fully set up, so I added that to the testing
> > > strategy,
> > > along with a quick example of handling CFG bifurcation for
> > > PyList_New
> > > failures.
> > > 
> > 
> > Hi Saksham
> > 
> > > If you have a few minutes next week, I’d love your brutal honesty
> > > on
> > > this.
> > 
> > Challenge accepted :)
> > 
> > One thing that might not be mentioned yet on the wiki page is that
> > the
> > existing plugin is the result of a previous GSoC project (by Eric
> > Feng,
> > in 2023):
> > https://summerofcode.withgoogle.com/archive/2023/projects/EzIUWs5x
> > https://gist.github.com/efric/9faa9cb19fe829b97a54d5c7eabf5e72
> > 
> > (I've added a link to the wiki)
> > 
> > You should update the wording of your proposal to mention this (and
> > e.g. how 3.11 broke the old code).
> > 
> > Re: 1. Abstract; probably worth noting that there are multiple ways
> > to
> > interface CPython with C: using libffi, using a binding generator
> > (such
> > as Cython), or writing C by hand.  This project is focusing on the
> > "writing C by hand" case, but we don't recommend people use this
> > approach; this is more about supporting legacy code.
> > 
> > Re 2. Motivation & Background:
> > 
> > "Crucially, the analysis will explicitly target CPython 3.11+
> > headers
> > as a baseline. This ensures accurate struct layouts,":  a nitpick:
> > note
> > that we don't want to have to care about precise in-memory layouts,
> > GCC's C frontend does this for us; what we care about is what
> > fields
> > there are and what their types are.  The region_model/store.cc code
> > does track things in terms of bit offsets, so we'll see those when
> > debugging, but the plugin should be written in terms of types and
> > fields.
> > 
> > "this project will integrate Python-specific domain knowledge
> > directly
> > into the analyzer core."  Really?  I was thinking that it's best to
> > keep this as a plugin, albeit an in-tree plugin.
> > 
> > "Crucially, the analysis will explicitly target CPython 3.11+
> > headers
> > as a baseline."  note that there have been other recent changes
> > beyond
> > PEP 683 as CPython developers have tried to optimize more
> > aggressively
> > than in the past (e.g. for JIT compilation).  The most recent
> > release
> > is 3.14, and that might well have other changes that the plugin
> > needs
> > to be aware of.  The ideal would be to support a wide range of 3.*
> > headers, but it's good to pick one and get that working first, to
> > avoid
> > getting swamped by compatibility concerns.
> > 
> > "Illustrative Example: The Silent Leak": looks good.
> > 
> > Re 3.2. Phase 2: Implementing the Reference Count State Machine:
> > Your implementation plan is rather different to what we tried
> > before,
> > in that you're proposing using a state_machine subclass to
> > associate
> > state with a pointer.  What we tried in 2023 is to count the number
> > of
> > pointers being stored pointing at each PyObject, and then compare
> > against the ob_refcnt, and complain at certain points when they got
> > out-of-sync (e.g. when the stack frame is popped).  This was
> > working
> > purely with the region_model/state code and didn't need a new
> > state_machine.  That approach did seem to work with the pre-PEP-683
> > implementation, but IIRC Eric got stuck spending a lot of his time
> > on
> > PyList_Append, and thus we only got a tiny subset of the API
> > covered -
> > but it did work.  Py_INCREF and PyDECREF are typically macros, and
> > so
> > by the time the analyzer "sees" the user's code, all we see are
> > reference count increments, decrements, and conditionals, and this
> > is
> > captured for us in the store by the region_model code; I think it
> > would
> > be hard to implement using a state_machine (though maybe I'm
> > wrong).
> > 
> > Note that there's huge amounts of repetition in the API (e.g.
> > "succeeds, returning a new reference, or fails, returning null" is
> > a
> > very common pattern).  So please make plenty of use of helper
> > subroutines, or the attributes idea described on the project wiki
> > page.
> > 
> > re "DejaGnu Regression Suite": re"the ascii-art execution paths"
> > note
> > that these tests tend to be "brittle" so we don't want many tests
> > expressed this way, if any at all - dg-warning and dg-message tend
> > to
> > be much more robust.
> > 
> > re "5. Timeline & Milestones (350 Hours)": I suggest dropping the
> > mentions of the state_machine approach, and this suggests a rewrite
> > of
> > this section.  I like the idea of building up a suite of buggy
> > extensions.  You'll want most of them to be as simple as possible,
> > along with some larger examples for "integration testing".  I
> > recommend
> > early on categorizing the API into the various patterns of
> > ownership/borrowing/stealing etc, and identifying examples of each,
> > and
> > trying a simple example of each early on, to verify that the
> > overall
> > approach will work on all the cases.
> > 
> > I don't like "strict formatting to GNU coding standard" being done
> > at
> > the end.  Better to set up your editor early on to adhere to these,
> > and
> > then have this happen throughout.  IIRC we have a .editorconfig
> > file,
> > so this should be trivial.  So this should be in the "community
> > bonding" phase.
> > 
> > The other thing you might like to try is some of the other
> > subprojects
> > within https://gcc.gnu.org/wiki/StaticAnalyzer/CPython ; some of
> > these
> > are relatively easy compared to reference count checking, e.g.
> > "Verification of PyMethodDef tables" and "Checking arguments of
> > "call"
> > calls" (though note the word "relatively" here).
> > 
> > Hope this makes sense; let me know if you have questions.  I need
> > to
> > move on, but note I may have missed some things, so consider
> > running an
> > update past me.
> > 
> > Dave
> > 
> > 
> > > I really want to make sure my plan for the state machine over
> > > GIMPLE
> > > aligns
> > > with the new class api. If my approach is off base anywhere,
> > > please
> > > let me
> > > know so I can rewrite it before the deadline.
> > > 
> > > Working on this project is my absolute top priority right now, so
> > > I'm
> > > ready
> > > to iterate on this draft as much as needed to get it right.
> > > 
> > > Thanks again for the atoi patch review earlier this week!
> > > 
> > > Best,
> > > Saksham Gupta
> > 
> >

Re: Draft GSoC 2026 Proposal: CPython API Checking, PR 107646

Reply via email to