On Sun, 2026-03-15 at 12:23 +0530, Saksham Gupta wrote: > Hi David, > > I’ve attached the draft of my GSoC proposal for the CPython API > checker. I > haven't submitted it to the official portal yet—I wanted to run it by > you > first to catch any mistakes and make sure the technical direction > actually > makes sense. > > I made sure to include your recent advice. The scope now explicitly > targets > Python 3.11+ to handle the PEP 683 changes. My Compile Farm account > (am-saksham) is also fully set up, so I added that to the testing > strategy, > along with a quick example of handling CFG bifurcation for PyList_New > failures. >
Hi Saksham > If you have a few minutes next week, I’d love your brutal honesty on > this. Challenge accepted :) One thing that might not be mentioned yet on the wiki page is that the existing plugin is the result of a previous GSoC project (by Eric Feng, in 2023): https://summerofcode.withgoogle.com/archive/2023/projects/EzIUWs5x https://gist.github.com/efric/9faa9cb19fe829b97a54d5c7eabf5e72 (I've added a link to the wiki) You should update the wording of your proposal to mention this (and e.g. how 3.11 broke the old code). Re: 1. Abstract; probably worth noting that there are multiple ways to interface CPython with C: using libffi, using a binding generator (such as Cython), or writing C by hand. This project is focusing on the "writing C by hand" case, but we don't recommend people use this approach; this is more about supporting legacy code. Re 2. Motivation & Background: "Crucially, the analysis will explicitly target CPython 3.11+ headers as a baseline. This ensures accurate struct layouts,": a nitpick: note that we don't want to have to care about precise in-memory layouts, GCC's C frontend does this for us; what we care about is what fields there are and what their types are. The region_model/store.cc code does track things in terms of bit offsets, so we'll see those when debugging, but the plugin should be written in terms of types and fields. "this project will integrate Python-specific domain knowledge directly into the analyzer core." Really? I was thinking that it's best to keep this as a plugin, albeit an in-tree plugin. "Crucially, the analysis will explicitly target CPython 3.11+ headers as a baseline." note that there have been other recent changes beyond PEP 683 as CPython developers have tried to optimize more aggressively than in the past (e.g. for JIT compilation). The most recent release is 3.14, and that might well have other changes that the plugin needs to be aware of. The ideal would be to support a wide range of 3.* headers, but it's good to pick one and get that working first, to avoid getting swamped by compatibility concerns. "Illustrative Example: The Silent Leak": looks good. Re 3.2. Phase 2: Implementing the Reference Count State Machine: Your implementation plan is rather different to what we tried before, in that you're proposing using a state_machine subclass to associate state with a pointer. What we tried in 2023 is to count the number of pointers being stored pointing at each PyObject, and then compare against the ob_refcnt, and complain at certain points when they got out-of-sync (e.g. when the stack frame is popped). This was working purely with the region_model/state code and didn't need a new state_machine. That approach did seem to work with the pre-PEP-683 implementation, but IIRC Eric got stuck spending a lot of his time on PyList_Append, and thus we only got a tiny subset of the API covered - but it did work. Py_INCREF and PyDECREF are typically macros, and so by the time the analyzer "sees" the user's code, all we see are reference count increments, decrements, and conditionals, and this is captured for us in the store by the region_model code; I think it would be hard to implement using a state_machine (though maybe I'm wrong). Note that there's huge amounts of repetition in the API (e.g. "succeeds, returning a new reference, or fails, returning null" is a very common pattern). So please make plenty of use of helper subroutines, or the attributes idea described on the project wiki page. re "DejaGnu Regression Suite": re"the ascii-art execution paths" note that these tests tend to be "brittle" so we don't want many tests expressed this way, if any at all - dg-warning and dg-message tend to be much more robust. re "5. Timeline & Milestones (350 Hours)": I suggest dropping the mentions of the state_machine approach, and this suggests a rewrite of this section. I like the idea of building up a suite of buggy extensions. You'll want most of them to be as simple as possible, along with some larger examples for "integration testing". I recommend early on categorizing the API into the various patterns of ownership/borrowing/stealing etc, and identifying examples of each, and trying a simple example of each early on, to verify that the overall approach will work on all the cases. I don't like "strict formatting to GNU coding standard" being done at the end. Better to set up your editor early on to adhere to these, and then have this happen throughout. IIRC we have a .editorconfig file, so this should be trivial. So this should be in the "community bonding" phase. The other thing you might like to try is some of the other subprojects within https://gcc.gnu.org/wiki/StaticAnalyzer/CPython ; some of these are relatively easy compared to reference count checking, e.g. "Verification of PyMethodDef tables" and "Checking arguments of "call" calls" (though note the word "relatively" here). Hope this makes sense; let me know if you have questions. I need to move on, but note I may have missed some things, so consider running an update past me. Dave > I really want to make sure my plan for the state machine over GIMPLE > aligns > with the new class api. If my approach is off base anywhere, please > let me > know so I can rewrite it before the deadline. > > Working on this project is my absolute top priority right now, so I'm > ready > to iterate on this draft as much as needed to get it right. > > Thanks again for the atoi patch review earlier this week! > > Best, > Saksham Gupta
