On 7/22/20 3:45 AM, Hyukjin Kwon wrote:
>
> Yeah, I tend to be positive about leveraging the Python type hints in
> general.
>
> However, just to clarify, I don’t think we should just port the type
> hints into the main codes yet but maybe think about
> having/porting Maciej's work, pyi files as stubs. For now, I tend to
> think adding type hints to the codes make it difficult to backport or
> revert and
>
That's probably one-time overhead so it is not a big issue.  In my
opinion, a bigger one is possible complexity. Annotations tend to
introduce a lot of cyclic dependencies in Spark codebase. This can be
addressed, but don't look great. 

Merging stubs into project structure from the other hand has almost no
overhead.

> more difficult to discuss about typing only especially considering
> typing is arguably premature yet.
>
> It is also interesting to take a look at other projects and how they
> did it. I took a look for the PySpark friends
> such as pandas or NumPy. Seems
>
>   * NumPy case had it as a separate project numpy-stubs and it was
>     merged into the main project successfully as pyi files.
>   * pandas case, I don’t see the work being done yet. I found an issue
>     related to this but it seems closed.
>
Actually there is quite a lot of ongoing work.
https://github.com/pandas-dev/pandas/issues/28142 is one ticket, but
individual work is handled separately (quite a few core modules already
have decent annotations). That being said, it seems unlikely that this
will be considered stable any time soon.

> Another important concern might be generic typing in Spark’s DataFrame
> as an example. Looks like that’s also one of the concerns from pandas’.
> For instance, how would we support variadic generic typing, for
> example, |DataFrame[int, str, str]| or |DataFrame[a: int, b: str, c:
> str]| ?
> Last time I checked, Python didn’t support this. Presumably at least
> Python from 3.6 to 3.8 wouldn't support.
> I am experimentally trying this in another project that I am working
> on but it requires a bunch of hacks and doesn’t play well with MyPy.
>
It doesn't, but considering the structure of the API, I am not sure how
useful this would be in the first place. Additionally generics are
somewhat limited anyway ‒ even in the best case scenario you can re

In practice, the biggest advantage is actually support for completion,
not type checking (which works in simple cases).

>  
> I currently don't have a strong feeling about it for now though I tend
> to agree.
> If we should do this, I would like to take a more conservative path
> such as having some separation
> for now e.g.) separate repo in Apache if feasible or separate module,
> and then see how it goes and users like it.
>
As said before ‒ I am happy to transfer ownership of the stubs to ASF if
there is a will to maintain these (either as standalone or inlined variant).

However, I am strongly against adding random annotations in the codebase
over prolonged time, as it is likely to break existing type hints (there
is limited support for merging, but it doesn't work well), with no
obvious replacement soon.

If merging or transferring ownership is not an option more involvement
from the contributors would be more than enough to reduce maintanance
overhead and provide some opportunity for KT and such.

>
>
> 2020년 7월 22일 (수) 오전 6:10, Driesprong, Fokko
> <fo...@driesprong.frl>님이 작성:
>
>     Fully agree Holden, would be great to include the Outreachy
>     project. Adding annotations is a very friendly way to get familiar
>     with the codebase.
>
>     I've also created a PR to see what's needed to get mypy
>     in: https://github.com/apache/spark/pull/29180 From there on we
>     can start adding annotations.
>
>     Cheers, Fokko
>
>
>     Op di 21 jul. 2020 om 21:40 schreef Holden Karau
>     <hol...@pigscanfly.ca <mailto:hol...@pigscanfly.ca>>:
>
>         Yeah I think this could be a great project now that we're only
>         Python 3.5+. One potential is making this an Outreachy project
>         to get more folks from different backgrounds involved in Spark.
>
>         On Tue, Jul 21, 2020 at 12:33 PM Driesprong, Fokko
>         <fo...@driesprong.frl> wrote:
>
>             Since we've recently dropped support for Python <=3.5
>             <https://github.com/apache/spark/pull/28957>, I think it
>             would be nice to add support for type annotations. Having
>             this in the main repository allows us to do type checking
>             using MyPy <http://mypy-lang.org/> in the CI itself.
>
>             This is now handled by the Stub
>             file: https://www.python.org/dev/peps/pep-0484/#stub-files However
>             I think it is nicer to integrate the types with the code
>             itself to keep everything in sync, and make it easier for
>             the people who work on the codebase itself. A first step
>             would be to move the stubs into the codebase. First step
>             would be to cover the public API which is the most
>             important one. Having the types with the code itself makes
>             it much easier to understand. For example, if you can
>             supply a str or column
>             here: 
> https://github.com/apache/spark/pull/29122/files#diff-f5295f69bfbdbf6e161aed54057ea36dR2486
>
>             One of the implications would be that future PR's on
>             Python should cover annotations on the public API's.
>             Curious what the rest of the community thinks.
>
>             Cheers, Fokko
>
>
>
>
>
>
>
>
>
>             Op di 21 jul. 2020 om 20:04 schreef zero323
>             <mszymkiew...@gmail.com <mailto:mszymkiew...@gmail.com>>:
>
>                 Given a discussion related to  SPARK-32320 PR
>                 <https://github.com/apache/spark/pull/29122>   I'd
>                 like to resurrect this
>                 thread. Is there any interest in migrating annotations
>                 to the main
>                 repository?
>
>
>
>                 --
>                 Sent from:
>                 http://apache-spark-developers-list.1001551.n3.nabble.com/
>
>                 
> ---------------------------------------------------------------------
>                 To unsubscribe e-mail:
>                 dev-unsubscr...@spark.apache.org
>                 <mailto:dev-unsubscr...@spark.apache.org>
>
>
>
>         -- 
>         Twitter: https://twitter.com/holdenkarau
>         Books (Learning Spark, High Performance Spark,
>         etc.): https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>         YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: A30CEF0C31A501EC

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to