Yeah, I tend to be positive about leveraging the Python type hints in general.
However, just to clarify, I don’t think we should just port the type hints into the main codes yet but maybe think about having/porting Maciej's work, pyi files as stubs. For now, I tend to think adding type hints to the codes make it difficult to backport or revert and more difficult to discuss about typing only especially considering typing is arguably premature yet. It is also interesting to take a look at other projects and how they did it. I took a look for the PySpark friends such as pandas or NumPy. Seems - NumPy case had it as a separate project numpy-stubs and it was merged into the main project successfully as pyi files. - pandas case, I don’t see the work being done yet. I found an issue related to this but it seems closed. Another important concern might be generic typing in Spark’s DataFrame as an example. Looks like that’s also one of the concerns from pandas’. For instance, how would we support variadic generic typing, for example, DataFrame[int, str, str] or DataFrame[a: int, b: str, c: str] ? Last time I checked, Python didn’t support this. Presumably at least Python from 3.6 to 3.8 wouldn't support. I am experimentally trying this in another project that I am working on but it requires a bunch of hacks and doesn’t play well with MyPy. I currently don't have a strong feeling about it for now though I tend to agree. If we should do this, I would like to take a more conservative path such as having some separation for now e.g.) separate repo in Apache if feasible or separate module, and then see how it goes and users like it. 2020년 7월 22일 (수) 오전 6:10, Driesprong, Fokko <fo...@driesprong.frl>님이 작성: > Fully agree Holden, would be great to include the Outreachy project. > Adding annotations is a very friendly way to get familiar with the codebase. > > I've also created a PR to see what's needed to get mypy in: > https://github.com/apache/spark/pull/29180 From there on we can start > adding annotations. > > Cheers, Fokko > > > Op di 21 jul. 2020 om 21:40 schreef Holden Karau <hol...@pigscanfly.ca>: > >> Yeah I think this could be a great project now that we're only Python >> 3.5+. One potential is making this an Outreachy project to get more folks >> from different backgrounds involved in Spark. >> >> On Tue, Jul 21, 2020 at 12:33 PM Driesprong, Fokko <fo...@driesprong.frl> >> wrote: >> >>> Since we've recently dropped support for Python <=3.5 >>> <https://github.com/apache/spark/pull/28957>, I think it would be nice >>> to add support for type annotations. Having this in the main repository >>> allows us to do type checking using MyPy <http://mypy-lang.org/> in the >>> CI itself. <http://mypy-lang.org/> >>> >>> This is now handled by the Stub file: >>> https://www.python.org/dev/peps/pep-0484/#stub-files However I think it >>> is nicer to integrate the types with the code itself to keep everything in >>> sync, and make it easier for the people who work on the codebase itself. A >>> first step would be to move the stubs into the codebase. First step would >>> be to cover the public API which is the most important one. Having the >>> types with the code itself makes it much easier to understand. For example, >>> if you can supply a str or column here: >>> https://github.com/apache/spark/pull/29122/files#diff-f5295f69bfbdbf6e161aed54057ea36dR2486 >>> >>> One of the implications would be that future PR's on Python should cover >>> annotations on the public API's. Curious what the rest of the community >>> thinks. >>> >>> Cheers, Fokko >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Op di 21 jul. 2020 om 20:04 schreef zero323 <mszymkiew...@gmail.com>: >>> >>>> Given a discussion related to SPARK-32320 PR >>>> <https://github.com/apache/spark/pull/29122> I'd like to resurrect >>>> this >>>> thread. Is there any interest in migrating annotations to the main >>>> repository? >>>> >>>> >>>> >>>> -- >>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >