Oh I think I caused some confusion here. Just for clarification, I wasn’t saying we must port this into a separate repo now. I was saying it can be one of the options we can consider.
For a bit of more context: This option was considered as, roughly speaking, an invalid option and it might need an incubation process as a separate project. After some investigations, I found that this is still a valid option and we can take this as the part of Apache Spark but in a separate repo. FWIW, NumPy took this approach. they made a separate repo <https://github.com/numpy/numpy-stubs>, and merged it into the main repo <https://github.com/numpy/numpy-stubs> after it became stable. My only major concerns are: - the possibility to fundamentally change the approach in pyspark-stubs <https://github.com/zero323/pyspark-stubs>. It’s not because how it was done is wrong but because how Python type hinting itself evolves. - If my understanding is correct, pyspark-stubs <https://github.com/zero323/pyspark-stubs> is still incomplete and does not annotate types in some other APIs (by using Any). Correct me if I am wrong, Maciej. I’ll have a short sync with him and share to understand better since he’d probably know the context best in PySpark type hints and I know some contexts in ASF and Apache Spark. 2020년 8월 5일 (수) 오전 6:31, Maciej Szymkiewicz <mszymkiew...@gmail.com>님이 작성: > Indeed, though the possible advantage is that in theory, you can have > different release cycle than for the main repo (I am not sure if that's > feasible in practice or if that was the intention). > > I guess all depends on how we envision the future of annotations > (including, but not limited to, how conservative we want to be in the > future). Which is probably something that should be discussed here. > On 8/4/20 11:06 PM, Felix Cheung wrote: > > So IMO maintaining outside in a separate repo is going to be harder. That > was why I asked. > > > > ------------------------------ > *From:* Maciej Szymkiewicz <mszymkiew...@gmail.com> > <mszymkiew...@gmail.com> > *Sent:* Tuesday, August 4, 2020 12:59 PM > *To:* Sean Owen > *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; Spark > Dev List > *Subject:* Re: [PySpark] Revisiting PySpark type annotations > > > On 8/4/20 9:35 PM, Sean Owen wrote > > Yes, but the general argument you make here is: if you tie this > > project to the main project, it will _have_ to be maintained by > > everyone. That's good, but also exactly I think the downside we want > > to avoid at this stage (I thought?) I understand for some > > undertakings, it's just not feasible to start outside the main > > project, but is there no proof of concept even possible before taking > > this step -- which more or less implies it's going to be owned and > > merged and have to be maintained in the main project. > > > I think we have a bit different understanding here ‒ I believe we have > reached a conclusion that maintaining annotations within the project is > OK, we only differ when it comes to specific form it should take. > > As of POC ‒ we have stubs, which have been maintained over three years > now and cover versions between 2.3 (though these are fairly limited) to, > with some lag, current master. There is some evidence there are used in > the wild > ( > https://github.com/zero323/pyspark-stubs/network/dependents?package_id=UGFja2FnZS02MzU1MTc4Mg%3D%3D > ), > there are a few contributors > (https://github.com/zero323/pyspark-stubs/graphs/contributors) and at > least some use cases (https://stackoverflow.com/q/40163106/). So, > subjectively speaking, it seems we're already beyond POC. > > -- > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > Keybase: https://keybase.io/zero323 > Gigs: https://www.codementor.io/@zero323 > PGP: A30CEF0C31A501EC > > > -- > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > Keybase: https://keybase.io/zero323 > Gigs: https://www.codementor.io/@zero323 > PGP: A30CEF0C31A501EC > >