Hello GDAL developers, Over the past weeks, while contributing to GDAL and working on Python binding-related issues and PRs, I have been studying the current Python stub generation pipeline in detail. In particular, I explored the docstub integration and the implementation in _analysis.py, _docstrings.py, and _stubs.py, along with recent PRs related to docstring cleanup and stub generation.
>From examining the code, I understand that: - .pyi files are generated entirely from docstrings using a custom Lark grammar. - Type resolution is handled through TypeMatcher and import reconstruction. - Unresolved types fall back to _typeshed.Incomplete. - There is currently no mechanical validation step ensuring that generated stubs remain consistent with the actual runtime callable signatures produced by SWIG. This means the stub layer is structurally decoupled from the runtime bindings, and drift between: C++ → SWIG → Python runtime → docstrings → generated stubs is theoretically possible without automated detection. For GSoC, I would like to explore a project focused on hardening and modernizing this pipeline through runtime–stub consistency validation and stricter enforcement mechanisms. A possible scope could include: *Runtime–Stub Signature Validator* - Import osgeo modules and inspect public callables using inspect.signature(). - Parse generated .pyi files. - Detect mismatches in parameter names, counts, defaults, and return presence. - Produce structured reports of inconsistencies. *Stricter Stub Generation Mode* - Optionally fail (or emit stronger diagnostics) on unresolved types instead of silently aliasing to _typeshed.Incomplete. - Provide measurable metrics on annotation coverage and unresolved types. *CI Integration* - Integrate validation checks into CI to prevent silent drift over time. - Keep the approach incremental and compatible with the existing docstring-driven workflow. The goal would not be to redesign SWIG bindings or replace the current system, but to introduce a validation and enforcement layer that increases confidence in typing correctness, IDE support, and long-term maintainability of the Python bindings. Before developing this into a formal proposal, I would really appreciate feedback on: - Whether runtime–stub consistency validation aligns with current Python binding priorities. - Whether there are known constraints or prior efforts in this direction. - Whether this scope would be appropriate and realistic for a GSoC project. Thank you very much for your time. I would be happy to refine or narrow this idea based on feedback. Best regards, Sionigdha
_______________________________________________ gdal-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/gdal-dev
