[ 
https://issues.apache.org/jira/browse/ARROW-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorrick Sleijster updated ARROW-17335:
--------------------------------------
    Description: 
As of Python3.6, it has been possible to introduce typing information in the 
code. This became immensely popular in a short period of time. Shortly after, 
the tool `mypy` arrived and this has become the industry standard for static 
type checking inside Python. It is able to check very quickly for invalid types 
which makes it possible to serve as a pre-commit. It has raised many bugs that 
I did not see myself and has been a very valuable tool.

Now what does this mean for PyArrow?

When we run mypy on code that uses PyArrow, you will get error message as 
follows:

```
some_util_using_pyarrow/hdfs_utils.py:5: error: Skipping analyzing "pyarrow": 
module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:9: error: Skipping analyzing "pyarrow": 
module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:11: error: Skipping analyzing 
"pyarrow.fs": module is installed, but missing library stubs or py.typed marker
```

More information is available here: 
[https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker]

You can solve this in three ways:
 # Ignore the message. This, however, will put all types from PyArrow to `Any`, 
making it unable to find user errors with the PyArrow library
 # Create a Python stub file. This is what previously used to be the standard, 
however, it no longer a popular option. This is because stubs are extra, next 
to the source code, while you can also inline the code with type hints, which 
brings me to our third option.
 # Create a `py.typed` file and use inline type hints. This is the most popular 
option today because it requires no extra files (except for the py.typed file), 
allows all the type hints to be with the code (like now in the documentation) 
and not only provides your users but also the developers of the library 
themselves with type hints (and hinting of issues inside your IDE).

 

My personal opinion already shines through the options, it is 3 as this has 
shortly become the industry standard since the introduction.

I'd very much like to work on this, however, I don't feel like wasting time. 
Therefore, I am raising this ticket to see if this had been considered before 
or if we just didn't get to this yet.

I'd like to open the discussion here:
 # Do you agree with number #3 as type hints.
 # Should we remove the documentation annotations for the type hints given they 
will be inside the functions? Or should we keep it and specify it in the code? 
Which would make it double.

 

  was:
As of Python3.6, it has been possible to introduce typing information in the 
code. This became immensely popular in a short period of time. Shortly after, 
the tool `mypy` arrived and this has become the industry standard for static 
type checking inside Python. It is able to check very quickly for invalid types 
which makes it possible to serve as a pre-commit. It has raised many bugs that 
I did not see myself and has been a very valuable tool.

Now what does this mean for PyArrow?

When we run code using PyArrow inside mypy you get the following error message:

```
some_util_using_pyarrow/hdfs_utils.py:5: error: Skipping analyzing "pyarrow": 
module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:9: error: Skipping analyzing "pyarrow": 
module is installed, but missing library stubs or py.typed marker
some_util_using_pyarrow/hdfs_utils.py:11: error: Skipping analyzing 
"pyarrow.fs": module is installed, but missing library stubs or py.typed marker
```

More information is available here: 
[https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker]

You can solve this in three ways:
 # Ignore the message. This, however, will put all types from PyArrow to `Any`, 
making it unable to find user errors with the PyArrow library
 # Create a Python stub file. This is what previously used to be the standard, 
however, it no longer a popular option. This is because stubs are extra, next 
to the source code, while you can also inline the code with type hints, which 
brings me to our third option.
 # Create a `py.typed` file and use inline type hints. This is the most popular 
option today because it requires no extra files (except for the py.typed file), 
allows all the type hints to be with the code (like now in the documentation) 
and not only provides your users but also the developers of the library 
themselves with type hints (and hinting of issues inside your IDE).

 

My personal opinion already shines through the options, it is 3 as this has 
shortly become the industry standard since the introduction.

I'd very much like to work on this, however, I don't feel like wasting time. 
Therefore, I am raising this ticket to see if this had been considered before 
or if we just didn't get to this yet.

I'd like to open the discussion here:
 # Do you agree with number #3 as type hints.
 # Should we remove the documentation annotations for the type hints given they 
will be inside the functions? Or should we keep it and specify it in the code? 
Which would make it double.

 


> [Python] Type checking support
> ------------------------------
>
>                 Key: ARROW-17335
>                 URL: https://issues.apache.org/jira/browse/ARROW-17335
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Jorrick Sleijster
>            Priority: Major
>   Original Estimate: 10h
>  Remaining Estimate: 10h
>
> As of Python3.6, it has been possible to introduce typing information in the 
> code. This became immensely popular in a short period of time. Shortly after, 
> the tool `mypy` arrived and this has become the industry standard for static 
> type checking inside Python. It is able to check very quickly for invalid 
> types which makes it possible to serve as a pre-commit. It has raised many 
> bugs that I did not see myself and has been a very valuable tool.
> Now what does this mean for PyArrow?
> When we run mypy on code that uses PyArrow, you will get error message as 
> follows:
> ```
> some_util_using_pyarrow/hdfs_utils.py:5: error: Skipping analyzing "pyarrow": 
> module is installed, but missing library stubs or py.typed marker
> some_util_using_pyarrow/hdfs_utils.py:9: error: Skipping analyzing "pyarrow": 
> module is installed, but missing library stubs or py.typed marker
> some_util_using_pyarrow/hdfs_utils.py:11: error: Skipping analyzing 
> "pyarrow.fs": module is installed, but missing library stubs or py.typed 
> marker
> ```
> More information is available here: 
> [https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker]
> You can solve this in three ways:
>  # Ignore the message. This, however, will put all types from PyArrow to 
> `Any`, making it unable to find user errors with the PyArrow library
>  # Create a Python stub file. This is what previously used to be the 
> standard, however, it no longer a popular option. This is because stubs are 
> extra, next to the source code, while you can also inline the code with type 
> hints, which brings me to our third option.
>  # Create a `py.typed` file and use inline type hints. This is the most 
> popular option today because it requires no extra files (except for the 
> py.typed file), allows all the type hints to be with the code (like now in 
> the documentation) and not only provides your users but also the developers 
> of the library themselves with type hints (and hinting of issues inside your 
> IDE).
>  
> My personal opinion already shines through the options, it is 3 as this has 
> shortly become the industry standard since the introduction.
> I'd very much like to work on this, however, I don't feel like wasting time. 
> Therefore, I am raising this ticket to see if this had been considered before 
> or if we just didn't get to this yet.
> I'd like to open the discussion here:
>  # Do you agree with number #3 as type hints.
>  # Should we remove the documentation annotations for the type hints given 
> they will be inside the functions? Or should we keep it and specify it in the 
> code? Which would make it double.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to