Actually there is, at least for pycharm. I actually opened a jira on it
(https://issues.apache.org/jira/browse/SPARK-17333). It describes two way of
doing it (I also made a github stub at:
https://github.com/assafmendelson/ExamplePysparkAnnotation). Unfortunately, I
never found the time to follow through.
That said, If we make a decision on the way to handle it then I believe it
would be a good idea to start even with the bare minimum and continue to add to
it (and therefore make it so many people can contribute). The code I added in
github were basically the things I needed.
To summarize, there are two main ways of doing it (at least in pycharm):
1. Give the hints as part of the docstring for the function
2. Create files with the signatures only and mark it for pycharm to use
The advantage of the first is that it is part of the code which means it is
easier to make it updated. The main issue with this is that supporting auto
generated code (as is the case in most functions) can be a little awkward and
actually is a relate to a separate issue as it means pycharm marks most of the
functions as an error (i.e. pyspark.sql.functions.XXX is marked as not thereā¦)
The advantage of the second is that it is completely separate so messing around
with it cannot harm the main code. The disadvantages are that we would need to
maintain it manually and that to use it in pycharm, one needs to add them to
the path (in pycharm this means mark them as source, I am not sure how other
IDEs support this).
Lastly, I only tested these two solutions for pycharm. I am not sure of their
support in other IDEs.
Thanks,
Assaf.
From: rxin [via Apache Spark Developers List]
[mailto:[email protected]]
Sent: Tuesday, May 23, 2017 1:10 PM
To: Mendelson, Assaf
Subject: Re: [PYTHON] PySpark typing hints
Seems useful to do. Is there a way to do this so it doesn't break Python 2.x?
On Sun, May 14, 2017 at 11:44 PM, Maciej Szymkiewicz <[hidden
email]</user/SendEmail.jtp?type=node&node=21611&i=0>> wrote:
Hi everyone,
For the last few months I've been working on static type annotations for
PySpark. For those of you, who are not familiar with the idea, typing hints
have been introduced by PEP 484 (https://www.python.org/dev/peps/pep-0484/) and
further extended with PEP 526 (https://www.python.org/dev/peps/pep-0526/) with
the main goal of providing information required for static analysis. Right now
there a few tools which support typing hints, including Mypy
(https://github.com/python/mypy) and PyCharm
(https://www.jetbrains.com/help/pycharm/2017.1/type-hinting-in-pycharm.html).
Type hints can be added using function annotations
(https://www.python.org/dev/peps/pep-3107/, Python 3 only), docstrings, or
source independent stub files
(https://www.python.org/dev/peps/pep-0484/#stub-files). Typing is optional,
gradual and has no runtime impact.
At this moment I've annotated majority of the API, including majority of
pyspark.sql and pyspark.ml<http://pyspark.ml>. At this moment project is still
rough around the edges, and may result in both false positive and false
negatives, but I think it become mature enough to be useful in practice.
The current version is compatible only with Python 3, but it is possible, with
some limitations, to backport it to Python 2 (though it is not on my todo list).
There is a number of possible benefits for PySpark users and developers:
* Static analysis can detect a number of common mistakes to prevent runtime
failures. Generic self is still fairly limited, so it is more useful with
DataFrames, SS and ML than RDD, DStreams or RDD.
* Annotations can be used for documenting complex signatures
(https://git.io/v95JN) including dependencies on arguments and value
(https://git.io/v95JA).
* Detecting possible bugs in Spark (SPARK-20631) .
* Showing API inconsistencies.
Roadmap
* Update the project to reflect Spark 2.2.
* Refine existing annotations.
If there will be enough interest I am happy to contribute this back to Spark or
submit to Typeshed (https://github.com/python/typeshed - this would require a
formal ASF approval, and since Typeshed doesn't provide versioning, is probably
not the best option in our case).
Further inforamtion:
* https://github.com/zero323/pyspark-stubs - GitHub repository
*
https://speakerdeck.com/marcobonzanini/static-type-analysis-for-robust-data-products-at-pydata-london-2017
- interesting presentation by Marco Bonzanini
--
Best,
Maciej
________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/PYTHON-PySpark-typing-hints-tp21560p21611.html
To start a new topic under Apache Spark Developers List, email
[email protected]<mailto:[email protected]>
To unsubscribe from Apache Spark Developers List, click
here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/PYTHON-PySpark-typing-hints-tp21560p21612.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.