jorisvandenbossche commented on issue #32609:
URL: https://github.com/apache/arrow/issues/32609#issuecomment-2334113297

   The end goal should be that we have rather complete type annotations for the 
users of pyarrow, i.e. with most part of pyarrow being in cython. Thinking 
through some options how to get there, and how to maintain and distribute those 
type stubs:
   
   - We promote [pyarrow-stubs](https://github.com/zen-xu/pyarrow-stubs) and 
recommend users to install that (something we can already do now), and the 
development of those stubs is kept outside of pyarrow itself. 
     - This of course has the problem of discoverability, and I assume that 
ideally we have at least basic type annotations directly included in pyarrow. 
     - On the other hand, it does allow providing type annotations for already 
released pyarrow, can provide fixes faster, and can potentially also include 
more extensive typing (for example, I am not sure if we would want to include 
such extensive [`pa.array(..)` 
overloads](https://github.com/zen-xu/pyarrow-stubs/blob/be1a8a5949a63ba7059748fc908581c5900c0cbb/pyarrow-stubs/__lib_pxi/array.pyi#L56)).
 Of course, even if pyarrow installs type stubs itself, users could still got 
to pyarrow-stubs for more advanced stubs.
   - We maintain something similar as `pyarrow-stubs` in pyarrow itself, i.e. 
hand-written stub files.
     - @zen-xu _if_ we would want to go this route, would you be interested in 
contributing (parts of) the stubs you wrote?
     - I think my main concern with this option is the maintenance effort 
involved. Related to two aspects: the effort of writing and understanding the 
type annotations itself (it introduces a lot of new concepts, but at the same 
time I assume that simpler stubs than what is now present in `pyarrow-stubs` 
would already be useful as well, and as mentioned above people who want more 
advanced stubs can always still install the external stubs), and then 
additionally the effort of keeping those stubs in sync with the cython source 
code while changes are made (mypy's `stubtest` can catch some basic 
discrepancies, but I think that is limited)
   - We include basic, auto-generated stub files (for the cython code, the 
python modules can have inline annotations).
     - This is what Dewey suggested above 
([comment](https://github.com/apache/arrow/issues/32609#issuecomment-2112805759))
     - By being auto-generated, they will (at least initially) be simpler and 
less complete as the stubs provided by `pyarrow-stubs`
     - We can either add some basic type annotations (eg return types) in the 
cython code (the cython compilation will ignore those, but the resulting python 
function object will know about them, so we can use that to autogenerate stubs) 
or add some inline comment with type hints, to help with the autogeneration
   
   Thoughts on this? Preferences? Other options you can think of?
   
   ---
   
   Personally, I think that _if_ we have a decent solution for auto-generation 
that produces "good enough" stubs, that would be my preference. 
   I have been looking into some of the existing (partial / abandoned) 
solutions, and I do think we can get something working short term. For example, 
I think it should be possible to get mypy's stubgen working to recognize 
cython's `cyfunction` as a normal function with minimal patches. Otherwise a 
similar approach as 
[mpi4py](https://github.com/mpi4py/mpi4py/blob/master/conf/mpistubgen.py) could 
also work. Or the suggestion of adding inline comments with type hints an a 
script to extract those. 
   And longer term we could look into reviving the PR in the cython repo to 
generate stub files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to