Re: [PySpark] Revisiting PySpark type annotations

Hyukjin Kwon Tue, 04 Aug 2020 20:52:07 -0700

Oh I think I caused some confusion here.
Just for clarification, I wasn’t saying we must port this into a separate
repo now. I was saying it can be one of the options we can consider.


For a bit of more context:
This option was considered as, roughly speaking, an invalid option and it
might need an incubation process as a separate project.
After some investigations, I found that this is still a valid option and we
can take this as the part of Apache Spark but in a separate repo.

FWIW, NumPy took this approach. they made a separate repo
<https://github.com/numpy/numpy-stubs>, and merged it into the main repo
<https://github.com/numpy/numpy-stubs> after it became stable.


My only major concerns are:

   - the possibility to fundamentally change the approach in pyspark-stubs
   <https://github.com/zero323/pyspark-stubs>. It’s not because how it was
   done is wrong but because how Python type hinting itself evolves.
   - If my understanding is correct, pyspark-stubs
   <https://github.com/zero323/pyspark-stubs> is still incomplete and does
   not annotate types in some other APIs (by using Any). Correct me if I am
   wrong, Maciej.

I’ll have a short sync with him and share to understand better since he’d
probably know the context best in PySpark type hints and I know some
contexts in ASF and Apache Spark.



2020년 8월 5일 (수) 오전 6:31, Maciej Szymkiewicz <mszymkiew...@gmail.com>님이 작성:

> Indeed, though the possible advantage is that in theory, you can have
> different release cycle than for the main repo (I am not sure if that's
> feasible in practice or if that was the intention).
>
> I guess all depends on how we envision the future of annotations
> (including, but not limited to, how conservative we want to be in the
> future). Which is probably something that should be discussed here.
> On 8/4/20 11:06 PM, Felix Cheung wrote:
>
> So IMO maintaining outside in a separate repo is going to be harder. That
> was why I asked.
>
>
>
> ------------------------------
> *From:* Maciej Szymkiewicz <mszymkiew...@gmail.com>
> <mszymkiew...@gmail.com>
> *Sent:* Tuesday, August 4, 2020 12:59 PM
> *To:* Sean Owen
> *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; Spark
> Dev List
> *Subject:* Re: [PySpark] Revisiting PySpark type annotations
>
>
> On 8/4/20 9:35 PM, Sean Owen wrote
> > Yes, but the general argument you make here is: if you tie this
> > project to the main project, it will _have_ to be maintained by
> > everyone. That's good, but also exactly I think the downside we want
> > to avoid at this stage (I thought?) I understand for some
> > undertakings, it's just not feasible to start outside the main
> > project, but is there no proof of concept even possible before taking
> > this step -- which more or less implies it's going to be owned and
> > merged and have to be maintained in the main project.
>
>
> I think we have a bit different understanding here ‒ I believe we have
> reached a conclusion that maintaining annotations within the project is
> OK, we only differ when it comes to specific form it should take.
>
> As of POC ‒ we have stubs, which have been maintained over three years
> now and cover versions between 2.3 (though these are fairly limited) to,
> with some lag, current master.  There is some evidence there are used in
> the wild
> (
> https://github.com/zero323/pyspark-stubs/network/dependents?package_id=UGFja2FnZS02MzU1MTc4Mg%3D%3D
> ),
> there are a few contributors
> (https://github.com/zero323/pyspark-stubs/graphs/contributors) and at
> least some use cases (https://stackoverflow.com/q/40163106/). So,
> subjectively speaking, it seems we're already beyond POC.
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> Keybase: https://keybase.io/zero323
> Gigs: https://www.codementor.io/@zero323
> PGP: A30CEF0C31A501EC
>
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> Keybase: https://keybase.io/zero323
> Gigs: https://www.codementor.io/@zero323
> PGP: A30CEF0C31A501EC
>
>

Re: [PySpark] Revisiting PySpark type annotations

Reply via email to