jorisvandenbossche commented on issue #32609: URL: https://github.com/apache/arrow/issues/32609#issuecomment-2334113297
The end goal should be that we have rather complete type annotations for the users of pyarrow, i.e. with most part of pyarrow being in cython. Thinking through some options how to get there, and how to maintain and distribute those type stubs: - We promote [pyarrow-stubs](https://github.com/zen-xu/pyarrow-stubs) and recommend users to install that (something we can already do now), and the development of those stubs is kept outside of pyarrow itself. - This of course has the problem of discoverability, and I assume that ideally we have at least basic type annotations directly included in pyarrow. - On the other hand, it does allow providing type annotations for already released pyarrow, can provide fixes faster, and can potentially also include more extensive typing (for example, I am not sure if we would want to include such extensive [`pa.array(..)` overloads](https://github.com/zen-xu/pyarrow-stubs/blob/be1a8a5949a63ba7059748fc908581c5900c0cbb/pyarrow-stubs/__lib_pxi/array.pyi#L56)). Of course, even if pyarrow installs type stubs itself, users could still got to pyarrow-stubs for more advanced stubs. - We maintain something similar as `pyarrow-stubs` in pyarrow itself, i.e. hand-written stub files. - @zen-xu _if_ we would want to go this route, would you be interested in contributing (parts of) the stubs you wrote? - I think my main concern with this option is the maintenance effort involved. Related to two aspects: the effort of writing and understanding the type annotations itself (it introduces a lot of new concepts, but at the same time I assume that simpler stubs than what is now present in `pyarrow-stubs` would already be useful as well, and as mentioned above people who want more advanced stubs can always still install the external stubs), and then additionally the effort of keeping those stubs in sync with the cython source code while changes are made (mypy's `stubtest` can catch some basic discrepancies, but I think that is limited) - We include basic, auto-generated stub files (for the cython code, the python modules can have inline annotations). - This is what Dewey suggested above ([comment](https://github.com/apache/arrow/issues/32609#issuecomment-2112805759)) - By being auto-generated, they will (at least initially) be simpler and less complete as the stubs provided by `pyarrow-stubs` - We can either add some basic type annotations (eg return types) in the cython code (the cython compilation will ignore those, but the resulting python function object will know about them, so we can use that to autogenerate stubs) or add some inline comment with type hints, to help with the autogeneration Thoughts on this? Preferences? Other options you can think of? --- Personally, I think that _if_ we have a decent solution for auto-generation that produces "good enough" stubs, that would be my preference. I have been looking into some of the existing (partial / abandoned) solutions, and I do think we can get something working short term. For example, I think it should be possible to get mypy's stubgen working to recognize cython's `cyfunction` as a normal function with minimal patches. Otherwise a similar approach as [mpi4py](https://github.com/mpi4py/mpi4py/blob/master/conf/mpistubgen.py) could also work. Or the suggestion of adding inline comments with type hints an a script to extract those. And longer term we could look into reviving the PR in the cython repo to generate stub files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
