Yeah, we had a short meeting. I had to check a few other things so some delays happened. I will share soon.
2020년 8월 20일 (목) 오후 7:14, Driesprong, Fokko <fo...@driesprong.frl>님이 작성: > Hi Maciej, Hyukjin, > > Did you find any time to discuss adding the types to the Python > repository? Would love to know what came out of it. > > Cheers, Fokko > > Op wo 5 aug. 2020 om 10:14 schreef Driesprong, Fokko <fo...@driesprong.frl > >: > >> Mostly echoing stuff that we've discussed in >> https://github.com/apache/spark/pull/29180, but good to have this also >> on the dev-list. >> >> > So IMO maintaining outside in a separate repo is going to be harder. >> That was why I asked. >> >> I agree with Felix, having this inside of the project would make it much >> easier to maintain. Having it inside of the ASF might be easier to port the >> pyi files to the actual Spark repository. >> >> > FWIW, NumPy took this approach. they made a separate repo, and merged >> it into the main repo after it became stable. >> >> As Maciej pointed out: >> >> > As of POC ‒ we have stubs, which have been maintained over three years >> now and cover versions between 2.3 (though these are fairly limited) to, >> with some lag, current master. >> >> What would be required to mark it as stable? >> >> > I guess all depends on how we envision the future of annotations >> (including, but not limited to, how conservative we want to be in the >> future). Which is probably something that should be discussed here. >> >> I'm happy to motivate people to contribute type hints, and I believe it >> is a very accessible way to get more people involved in the Python >> codebase. Using the ASF model we can ensure that we require committers/PMC >> to sign off on the annotations. >> >> > Indeed, though the possible advantage is that in theory, you can have >> different release cycle than for the main repo (I am not sure if that's >> feasible in practice or if that was the intention). >> >> Personally, I don't think we need a different cycle if the type hints are >> part of the code itself. >> >> > If my understanding is correct, pyspark-stubs is still incomplete and >> does not annotate types in some other APIs (by using Any). Correct me if I >> am wrong, Maciej. >> >> For me, it is a bit like code coverage. You want this to be high to make >> sure that you cover most of the APIs, but it will take some time to make it >> complete. >> >> For me, it feels a bit like a chicken and egg problem. Because the type >> hints are in a separate repository, they will always lag behind. Also, it >> is harder to spot where the gaps are. >> >> Cheers, Fokko >> >> >> >> Op wo 5 aug. 2020 om 05:51 schreef Hyukjin Kwon <gurwls...@gmail.com>: >> >>> Oh I think I caused some confusion here. >>> Just for clarification, I wasn’t saying we must port this into a >>> separate repo now. I was saying it can be one of the options we can >>> consider. >>> >>> For a bit of more context: >>> This option was considered as, roughly speaking, an invalid option and >>> it might need an incubation process as a separate project. >>> After some investigations, I found that this is still a valid option and >>> we can take this as the part of Apache Spark but in a separate repo. >>> >>> FWIW, NumPy took this approach. they made a separate repo >>> <https://github.com/numpy/numpy-stubs>, and merged it into the main repo >>> <https://github.com/numpy/numpy-stubs> after it became stable. >>> >>> >>> My only major concerns are: >>> >>> - the possibility to fundamentally change the approach in >>> pyspark-stubs <https://github.com/zero323/pyspark-stubs>. It’s not >>> because how it was done is wrong but because how Python type hinting >>> itself >>> evolves. >>> - If my understanding is correct, pyspark-stubs >>> <https://github.com/zero323/pyspark-stubs> is still incomplete and >>> does not annotate types in some other APIs (by using Any). Correct me if >>> I >>> am wrong, Maciej. >>> >>> I’ll have a short sync with him and share to understand better since >>> he’d probably know the context best in PySpark type hints and I know some >>> contexts in ASF and Apache Spark. >>> >>> >>> >>> 2020년 8월 5일 (수) 오전 6:31, Maciej Szymkiewicz <mszymkiew...@gmail.com>님이 >>> 작성: >>> >>>> Indeed, though the possible advantage is that in theory, you can have >>>> different release cycle than for the main repo (I am not sure if that's >>>> feasible in practice or if that was the intention). >>>> >>>> I guess all depends on how we envision the future of annotations >>>> (including, but not limited to, how conservative we want to be in the >>>> future). Which is probably something that should be discussed here. >>>> On 8/4/20 11:06 PM, Felix Cheung wrote: >>>> >>>> So IMO maintaining outside in a separate repo is going to be harder. >>>> That was why I asked. >>>> >>>> >>>> >>>> ------------------------------ >>>> *From:* Maciej Szymkiewicz <mszymkiew...@gmail.com> >>>> <mszymkiew...@gmail.com> >>>> *Sent:* Tuesday, August 4, 2020 12:59 PM >>>> *To:* Sean Owen >>>> *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; >>>> Spark Dev List >>>> *Subject:* Re: [PySpark] Revisiting PySpark type annotations >>>> >>>> >>>> On 8/4/20 9:35 PM, Sean Owen wrote >>>> > Yes, but the general argument you make here is: if you tie this >>>> > project to the main project, it will _have_ to be maintained by >>>> > everyone. That's good, but also exactly I think the downside we want >>>> > to avoid at this stage (I thought?) I understand for some >>>> > undertakings, it's just not feasible to start outside the main >>>> > project, but is there no proof of concept even possible before taking >>>> > this step -- which more or less implies it's going to be owned and >>>> > merged and have to be maintained in the main project. >>>> >>>> >>>> I think we have a bit different understanding here ‒ I believe we have >>>> reached a conclusion that maintaining annotations within the project is >>>> OK, we only differ when it comes to specific form it should take. >>>> >>>> As of POC ‒ we have stubs, which have been maintained over three years >>>> now and cover versions between 2.3 (though these are fairly limited) to, >>>> with some lag, current master. There is some evidence there are used in >>>> the wild >>>> ( >>>> https://github.com/zero323/pyspark-stubs/network/dependents?package_id=UGFja2FnZS02MzU1MTc4Mg%3D%3D >>>> ), >>>> there are a few contributors >>>> (https://github.com/zero323/pyspark-stubs/graphs/contributors) and at >>>> least some use cases (https://stackoverflow.com/q/40163106/). So, >>>> subjectively speaking, it seems we're already beyond POC. >>>> >>>> -- >>>> Best regards, >>>> Maciej Szymkiewicz >>>> >>>> Web: https://zero323.net >>>> Keybase: https://keybase.io/zero323 >>>> Gigs: https://www.codementor.io/@zero323 >>>> PGP: A30CEF0C31A501EC >>>> >>>> >>>> -- >>>> Best regards, >>>> Maciej Szymkiewicz >>>> >>>> Web: https://zero323.net >>>> Keybase: https://keybase.io/zero323 >>>> Gigs: https://www.codementor.io/@zero323 >>>> PGP: A30CEF0C31A501EC >>>> >>>>