No worries, thanks for the update! Op do 20 aug. 2020 om 12:50 schreef Hyukjin Kwon <gurwls...@gmail.com>
> Yeah, we had a short meeting. I had to check a few other things so some > delays happened. I will share soon. > > 2020년 8월 20일 (목) 오후 7:14, Driesprong, Fokko <fo...@driesprong.frl>님이 작성: > >> Hi Maciej, Hyukjin, >> >> Did you find any time to discuss adding the types to the Python >> repository? Would love to know what came out of it. >> >> Cheers, Fokko >> >> Op wo 5 aug. 2020 om 10:14 schreef Driesprong, Fokko <fo...@driesprong.frl >> >: >> >>> Mostly echoing stuff that we've discussed in >>> https://github.com/apache/spark/pull/29180, but good to have this also >>> on the dev-list. >>> >>> > So IMO maintaining outside in a separate repo is going to be harder. >>> That was why I asked. >>> >>> I agree with Felix, having this inside of the project would make it much >>> easier to maintain. Having it inside of the ASF might be easier to port the >>> pyi files to the actual Spark repository. >>> >>> > FWIW, NumPy took this approach. they made a separate repo, and merged >>> it into the main repo after it became stable. >>> >>> As Maciej pointed out: >>> >>> > As of POC ‒ we have stubs, which have been maintained over three years >>> now and cover versions between 2.3 (though these are fairly limited) to, >>> with some lag, current master. >>> >>> What would be required to mark it as stable? >>> >>> > I guess all depends on how we envision the future of annotations >>> (including, but not limited to, how conservative we want to be in the >>> future). Which is probably something that should be discussed here. >>> >>> I'm happy to motivate people to contribute type hints, and I believe it >>> is a very accessible way to get more people involved in the Python >>> codebase. Using the ASF model we can ensure that we require committers/PMC >>> to sign off on the annotations. >>> >>> > Indeed, though the possible advantage is that in theory, you can have >>> different release cycle than for the main repo (I am not sure if that's >>> feasible in practice or if that was the intention). >>> >>> Personally, I don't think we need a different cycle if the type >>> hints are part of the code itself. >>> >>> > If my understanding is correct, pyspark-stubs is still incomplete and >>> does not annotate types in some other APIs (by using Any). Correct me if I >>> am wrong, Maciej. >>> >>> For me, it is a bit like code coverage. You want this to be high to make >>> sure that you cover most of the APIs, but it will take some time to make it >>> complete. >>> >>> For me, it feels a bit like a chicken and egg problem. Because the type >>> hints are in a separate repository, they will always lag behind. Also, it >>> is harder to spot where the gaps are. >>> >>> Cheers, Fokko >>> >>> >>> >>> Op wo 5 aug. 2020 om 05:51 schreef Hyukjin Kwon <gurwls...@gmail.com>: >>> >>>> Oh I think I caused some confusion here. >>>> Just for clarification, I wasn’t saying we must port this into a >>>> separate repo now. I was saying it can be one of the options we can >>>> consider. >>>> >>>> >>>> For a bit of more context: >>>> This option was considered as, roughly speaking, an invalid option and >>>> it might need an incubation process as a separate project. >>>> After some investigations, I found that this is still a valid option >>>> and we can take this as the part of Apache Spark but in a separate repo. >>>> >>>> >>>> FWIW, NumPy took this approach. they made a separate repo >>>> <https://github.com/numpy/numpy-stubs>, and merged it into the main >>>> repo <https://github.com/numpy/numpy-stubs> after it became stable. >>>> >>>> >>>> >>>> My only major concerns are: >>>> >>>> >>>> >>>> >>>> >>>> - the possibility to fundamentally change the approach in >>>> pyspark-stubs <https://github.com/zero323/pyspark-stubs>. It’s not >>>> because how it was done is wrong but because how Python type hinting >>>> itself >>>> evolves. >>>> >>>> - If my understanding is correct, pyspark-stubs >>>> <https://github.com/zero323/pyspark-stubs> is still incomplete and >>>> does not annotate types in some other APIs (by using Any). Correct me >>>> if I >>>> am wrong, Maciej. >>>> >>>> >>>> >>>> >>>> I’ll have a short sync with him and share to understand better since >>>> he’d probably know the context best in PySpark type hints and I know some >>>> contexts in ASF and Apache Spark. >>>> >>>> >>>> >>>> >>>> >>>> 2020년 8월 5일 (수) 오전 6:31, Maciej Szymkiewicz <mszymkiew...@gmail.com>님이 >>>> 작성: >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Indeed, though the possible advantage is that in theory, you can >>>>> >>>>> have different release cycle than for the main repo (I am not sure >>>>> >>>>> if that's feasible in practice or if that was the intention). >>>>> >>>>> >>>>> I guess all depends on how we envision the future of annotations >>>>> >>>>> (including, but not limited to, how conservative we want to be in >>>>> >>>>> the future). Which is probably something that should be discussed >>>>> >>>>> here. >>>>> >>>>> >>>>> >>>>> >>>>> On 8/4/20 11:06 PM, Felix Cheung wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> So IMO maintaining outside in a separate repo is going >>>>> >>>>> to be harder. That was why I asked. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> >>>>> *From:* Maciej Szymkiewicz >>>>> >>>>> <mszymkiew...@gmail.com> <mszymkiew...@gmail.com> >>>>> >>>>> >>>>> *Sent:* Tuesday, August 4, 2020 12:59 PM >>>>> >>>>> >>>>> *To:* Sean Owen >>>>> >>>>> >>>>> *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; >>>>> >>>>> Holden Karau; Spark Dev List >>>>> >>>>> >>>>> *Subject:* Re: [PySpark] Revisiting PySpark type >>>>> >>>>> annotations >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 8/4/20 9:35 PM, Sean Owen wrote >>>>> >>>>> >>>>> > Yes, but the general argument you make here is: if >>>>> >>>>> you tie this >>>>> >>>>> >>>>> > project to the main project, it will _have_ to be >>>>> >>>>> maintained by >>>>> >>>>> >>>>> > everyone. That's good, but also exactly I think the >>>>> >>>>> downside we want >>>>> >>>>> >>>>> > to avoid at this stage (I thought?) I understand >>>>> >>>>> for some >>>>> >>>>> >>>>> > undertakings, it's just not feasible to start >>>>> >>>>> outside the main >>>>> >>>>> >>>>> > project, but is there no proof of concept even >>>>> >>>>> possible before taking >>>>> >>>>> >>>>> > this step -- which more or less implies it's going >>>>> >>>>> to be owned and >>>>> >>>>> >>>>> > merged and have to be maintained in the main >>>>> >>>>> project. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I think we have a bit different understanding here ‒ I >>>>> >>>>> believe we have >>>>> >>>>> >>>>> reached a conclusion that maintaining annotations within >>>>> >>>>> the project is >>>>> >>>>> >>>>> OK, we only differ when it comes to specific form it >>>>> >>>>> should take. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> As of POC ‒ we have stubs, which have been maintained >>>>> >>>>> over three years >>>>> >>>>> >>>>> now and cover versions between 2.3 (though these are >>>>> >>>>> fairly limited) to, >>>>> >>>>> >>>>> with some lag, current master. There is some evidence >>>>> >>>>> there are used in >>>>> >>>>> >>>>> the wild >>>>> >>>>> >>>>> ( >>>>> https://github.com/zero323/pyspark-stubs/network/dependents?package_id=UGFja2FnZS02MzU1MTc4Mg%3D%3D >>>>> ), >>>>> >>>>> >>>>> there are a few contributors >>>>> >>>>> >>>>> (https://github.com/zero323/pyspark-stubs/graphs/contributors) >>>>> >>>>> and at >>>>> >>>>> >>>>> least some use cases (https://stackoverflow.com/q/40163106/). >>>>> >>>>> So, >>>>> >>>>> >>>>> subjectively speaking, it seems we're already beyond >>>>> >>>>> POC. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> >>>>> Maciej Szymkiewicz >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Web: https://zero323.net >>>>> >>>>> >>>>> Keybase: https://keybase.io/zero323 >>>>> >>>>> >>>>> Gigs: https://www.codementor.io/@zero323 >>>>> >>>>> >>>>> PGP: A30CEF0C31A501EC >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best regards, >>>>> >>>>> Maciej Szymkiewicz >>>>> >>>>> >>>>> >>>>> Web: https://zero323.net >>>>> >>>>> Keybase: https://keybase.io/zero323 >>>>> >>>>> Gigs: https://www.codementor.io/@zero323 >>>>> >>>>> PGP: A30CEF0C31A501EC >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >