[ 
https://issues.apache.org/jira/browse/ARROW-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577314#comment-17577314
 ] 

Joris Van den Bossche commented on ARROW-17335:
-----------------------------------------------

{quote}Well it's not really a duplicate of ARROW-8175.

The difference lies in the fact that that ticket is focused perform type 
checking on the PyArrow code base and ensuring all the types are valid inside 
the library.

My ticket is about using the PyArrow code base as a library and ensuring we can 
type check projects that are using PyArrow by using type annotations on 
functions specified inside the PyArrow codebase.{quote}

It's indeed not exactly the same. But _in practice_, I think both aspects are 
very much related and we could (should?) do those at the same time. If we start 
adding type annotations so that pyarrow can used by other projects that are 
type-checked, it would be good that at the same time we also _check_ that those 
type annotations we are adding are correct (although, based on my limited 
experience with this, just running mypy on the code base is always a bit 
limited I suppose, as it doesn't guarantee the type checks are actually 
correct? (it only might find some incorrect ones))

 

> [Python] Type checking support
> ------------------------------
>
>                 Key: ARROW-17335
>                 URL: https://issues.apache.org/jira/browse/ARROW-17335
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Jorrick Sleijster
>            Priority: Major
>   Original Estimate: 10h
>  Remaining Estimate: 10h
>
> h1. mypy and static type checking
> As of Python3.6, it has been possible to introduce typing information in the 
> code. This became immensely popular in a short period of time. Shortly after, 
> the tool `mypy` arrived and this has become the industry standard for static 
> type checking inside Python. It is able to check very quickly for invalid 
> types which makes it possible to serve as a pre-commit. It has raised many 
> bugs that I did not see myself and has been a very valuable tool.
> h2. Now what does this mean for PyArrow?
> When we run mypy on code that uses PyArrow, you will get error message as 
> follows:
> ```
> some_util_using_pyarrow/hdfs_utils.py:5: error: Skipping analyzing "pyarrow": 
> module is installed, but missing library stubs or py.typed marker
> some_util_using_pyarrow/hdfs_utils.py:9: error: Skipping analyzing "pyarrow": 
> module is installed, but missing library stubs or py.typed marker
> some_util_using_pyarrow/hdfs_utils.py:11: error: Skipping analyzing 
> "pyarrow.fs": module is installed, but missing library stubs or py.typed 
> marker
> ```
> More information is available here: 
> [https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker]
> h2. You can solve this in three ways:
>  # Ignore the message. This, however, will put all types from PyArrow to 
> `Any`, making it unable to find user errors with the PyArrow library
>  # Create a Python stub file. This is what previously used to be the 
> standard, however, it no longer a popular option. This is because stubs are 
> extra, next to the source code, while you can also inline the code with type 
> hints, which brings me to our third option.
>  # Create a `py.typed` file and use inline type hints. This is the most 
> popular option today because it requires no extra files (except for the 
> py.typed file), allows all the type hints to be with the code (like now in 
> the documentation) and not only provides your users but also the developers 
> of the library themselves with type hints (and hinting of issues inside your 
> IDE).
>  
> My personal opinion already shines through the options, it is 3 as this has 
> shortly become the industry standard since the introduction.
> h2. What should we do?
> I'd very much like to work on this, however, I don't feel like wasting time. 
> Therefore, I am raising this ticket to see if this had been considered before 
> or if we just didn't get to this yet.
> I'd like to open the discussion here:
>  # Do you agree with number #3 as type hints.
>  # Should we remove the documentation annotations for the type hints given 
> they will be inside the functions? Or should we keep it and specify it in the 
> code? Which would make it double.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to