Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Hyukjin Kwon
Thanks Maciej and Fokko. 2020년 8월 28일 (금) 오전 6:09, Maciej 님이 작성: > On my side, I'll try to identify any possible problems by the end of the > week or so (at somewhat crude inspection there is nothing unexpected or > particularly hard to resolve, but sometimes problem occur when you try to >

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
On my side, I'll try to identify any possible problems by the end of the week or so (at somewhat crude inspection there is nothing unexpected or particularly hard to resolve, but sometimes problem occur when you try to refine things) and I'll post an update. Maybe we could take it from there? In

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
Oh, this is probably because of how annotations are handled. In general stubs take preference over inline annotations and are considered the only source of type hints, unless packaged is marked as partially typed (https://www.python.org/dev/peps/pep-0561/#id21). In such case however is

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Driesprong, Fokko
Looking at it a second time, I think it is only mypy that's complaining: MacBook-Pro-van-Fokko:spark fokkodriesprong$ git diff *diff --git a/python/pyspark/accumulators.pyi b/python/pyspark/accumulators.pyi* *index f60de25704..6eafe46a46 100644* *--- a/python/pyspark/accumulators.pyi* *+++

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
Well, technically speaking annotation and actual are not the same thing. Many parts of Spark API might require heavy overloads to either capture relationships between arguments (for example in case of ML) or to capture at least rudimentary relationships between inputs and outputs (i.e. udfs).

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Maciej
That doesn't sound right. Would it be a problem for you to provide reproducible example? On 8/27/20 6:09 PM, Driesprong, Fokko wrote: > Today I've updated [SPARK-17333][PYSPARK] Enable mypy on the > repository  and while > doing so I've noticed that

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Driesprong, Fokko
;>>>>and does not annotate types in some other APIs (by using Any). >>>>>> Correct me >>>>>>if I am wrong, Maciej. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I’

Re: [PySpark] Revisiting PySpark type annotations

2020-08-27 Thread Hyukjin Kwon
t;>>>> >>>>> >>>>> >>>>> I’ll have a short sync with him and share to understand better since >>>>> he’d probably know the context best in PySpark type hints and I know some >>>>> contexts in ASF an

Re: [PySpark] Revisiting PySpark type annotations

2020-08-20 Thread Driesprong, Fokko
;> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Indeed, though the possible advantage is that in theory, you can >>>>> &g

Re: [PySpark] Revisiting PySpark type annotations

2020-08-20 Thread Hyukjin Kwon
;> contexts in ASF and Apache Spark. >>> >>> >>> >>> 2020년 8월 5일 (수) 오전 6:31, Maciej Szymkiewicz 님이 >>> 작성: >>> >>>> Indeed, though the possible advantage is that in theory, you can have >>>> different release cycle than for the main repo (I am not sure i

Re: [PySpark] Revisiting PySpark type annotations

2020-08-20 Thread Driesprong, Fokko
I am not sure if that's >>> feasible in practice or if that was the intention). >>> >>> I guess all depends on how we envision the future of annotations >>> (including, but not limited to, how conservative we want to be in the >>> future). Which is probably something that should

Re: [PySpark] Revisiting PySpark type annotations

2020-08-05 Thread Driesprong, Fokko
ix Cheung wrote: >> >> So IMO maintaining outside in a separate repo is going to be harder. That >> was why I asked. >> >> >> >> ---------- >> *From:* Maciej Szymkiewicz >> >> *Sent:* Tuesday, August 4, 2020 12:59 PM >>

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Hyukjin Kwon
esday, August 4, 2020 12:59 PM > *To:* Sean Owen > *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; Spark > Dev List > *Subject:* Re: [PySpark] Revisiting PySpark type annotations > > > On 8/4/20 9:35 PM, Sean Owen wrote > > Yes, but the general ar

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Maciej Szymkiewicz
asked. > > >   > > *From:* Maciej Szymkiewicz > *Sent:* Tuesday, August 4, 2020 12:59 PM > *To:* Sean Owen > *Cc:* Felix Cheung; Hyukjin Kwon; Driesprong, Fokko; Holden Karau; > Spark Dev List > *Subject:* Re: [PySpark] Revisiting PySpark type annotations

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Felix Cheung
: [PySpark] Revisiting PySpark type annotations On 8/4/20 9:35 PM, Sean Owen wrote > Yes, but the general argument you make here is: if you tie this > project to the main project, it will _have_ to be maintained by > everyone. That's good, but also exactly I think the downside we want &

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Maciej Szymkiewicz
On 8/4/20 9:35 PM, Sean Owen wrote > Yes, but the general argument you make here is: if you tie this > project to the main project, it will _have_ to be maintained by > everyone. That's good, but also exactly I think the downside we want > to avoid at this stage (I thought?) I understand for some

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Sean Owen
On Tue, Aug 4, 2020 at 2:32 PM Maciej Szymkiewicz wrote: > > First of all why ASF ownership? > > For the project of this size maintaining high quality (it is not hard to use > stubgen or monkeytype, but resulting annotations are rather simplistic) > annotations independent of the actual

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Maciej Szymkiewicz
separate git repo? >> >> >> From: Hyukjin Kwon >> Sent: Monday, August 3, 2020 1:58:55 AM >> To: Maciej Szymkiewicz >> Cc: Driesprong, Fokko ; Holden Karau >> ; Spark Dev List >> Subject: Re: [PySpark] Revisiting PySpark type annotati

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Sean Owen
t; Cc: Driesprong, Fokko ; Holden Karau > ; Spark Dev List > Subject: Re: [PySpark] Revisiting PySpark type annotations > > Okay, seems like we can create a separate repo as apache/spark? e.g.) > https://issues.apache.org/jira/browse/INFRA-20470 > We can also think about port

Re: [PySpark] Revisiting PySpark type annotations

2020-08-04 Thread Felix Cheung
What would be the reason for separate git repo? From: Hyukjin Kwon Sent: Monday, August 3, 2020 1:58:55 AM To: Maciej Szymkiewicz Cc: Driesprong, Fokko ; Holden Karau ; Spark Dev List Subject: Re: [PySpark] Revisiting PySpark type annotations Okay, seems like

Re: [PySpark] Revisiting PySpark type annotations

2020-08-03 Thread Driesprong, Fokko
Cool stuff! Moving it to the ASF would be a great first step. I think you might want to check the IP Clearance template: http://incubator.apache.org/ip-clearance/ip-clearance-template.html This is the one being used when donating the Airflow Kubernetes operator from Google to the ASF:

Re: [PySpark] Revisiting PySpark type annotations

2020-08-03 Thread Hyukjin Kwon
Okay, seems like we can create a separate repo as apache/spark? e.g.) https://issues.apache.org/jira/browse/INFRA-20470 We can also think about porting the files as are. I will try to have a short sync with the author Maciej, and share what we discussed offline. 2020년 7월 22일 (수) 오후 10:43, Maciej

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
W dniu środa, 22 lipca 2020 Driesprong, Fokko napisał(a): > That's probably one-time overhead so it is not a big issue. In my > opinion, a bigger one is possible complexity. Annotations tend to introduce > a lot of cyclic dependencies in Spark codebase. This can be addressed, but > don't look

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Driesprong, Fokko
That's probably one-time overhead so it is not a big issue. In my opinion, a bigger one is possible complexity. Annotations tend to introduce a lot of cyclic dependencies in Spark codebase. This can be addressed, but don't look great. This is not true (anymore). With Python 3.6 you can add

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
On 7/22/20 3:45 AM, Hyukjin Kwon wrote: > For now, I tend to think adding type hints to the codes make it > difficult to backport or revert and > more difficult to discuss about typing only especially considering > typing is arguably premature yet. About being premature ‒ since typing ecosystem

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
On 7/21/20 9:40 PM, Holden Karau wrote: > Yeah I think this could be a great project now that we're only Python > 3.5+. One potential is making this an Outreachy project to get more > folks from different backgrounds involved in Spark. I am honestly not sure if that's really the case. At the

Re: [PySpark] Revisiting PySpark type annotations

2020-07-22 Thread Maciej Szymkiewicz
On 7/22/20 3:45 AM, Hyukjin Kwon wrote: > > Yeah, I tend to be positive about leveraging the Python type hints in > general. > > However, just to clarify, I don’t think we should just port the type > hints into the main codes yet but maybe think about > having/porting Maciej's work, pyi files as

Re: [PySpark] Revisiting PySpark type annotations

2020-07-21 Thread Hyukjin Kwon
Yeah, I tend to be positive about leveraging the Python type hints in general. However, just to clarify, I don’t think we should just port the type hints into the main codes yet but maybe think about having/porting Maciej's work, pyi files as stubs. For now, I tend to think adding type hints to

Re: [PySpark] Revisiting PySpark type annotations

2020-07-21 Thread Driesprong, Fokko
Fully agree Holden, would be great to include the Outreachy project. Adding annotations is a very friendly way to get familiar with the codebase. I've also created a PR to see what's needed to get mypy in: https://github.com/apache/spark/pull/29180 From there on we can start adding annotations.

Re: [PySpark] Revisiting PySpark type annotations

2020-07-21 Thread Holden Karau
Yeah I think this could be a great project now that we're only Python 3.5+. One potential is making this an Outreachy project to get more folks from different backgrounds involved in Spark. On Tue, Jul 21, 2020 at 12:33 PM Driesprong, Fokko wrote: > Since we've recently dropped support for

Re: [PySpark] Revisiting PySpark type annotations

2020-07-21 Thread Driesprong, Fokko
Since we've recently dropped support for Python <=3.5 , I think it would be nice to add support for type annotations. Having this in the main repository allows us to do type checking using MyPy in the CI itself.

Re: [PySpark] Revisiting PySpark type annotations

2020-07-21 Thread zero323
Given a discussion related to SPARK-32320 PR I'd like to resurrect this thread. Is there any interest in migrating annotations to the main repository? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [PySpark] Revisiting PySpark type annotations

2020-07-21 Thread zero323
Given a discussion related to SPARK-32320 PR I'd like to resurrect this thread. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: [PySpark] Revisiting PySpark type annotations

2019-01-26 Thread zero323
As already pointed out by Nicholas, there is no Python 2 conflict here. Moreover, despite the fact that I used Python 3 specific feature, Python 2 users can benefit from the annotations as well in some circumstances (already mentioned MyPy is one option, PyCharm another, maybe some other tools as

Re: [PySpark] Revisiting PySpark type annotations

2019-01-25 Thread Nicholas Chammas
I think the annotations are compatible with Python 2 since Maciej implemented them via stub files , which Python 2 simply ignores. Folks using mypy to check types will get the benefit whether they're on Python 2 or 3,

Re: [PySpark] Revisiting PySpark type annotations

2019-01-25 Thread Reynold Xin
If we can make the annotation compatible with Python 2, why don’t we add type annotation to make life easier for users of Python 3 (with type)? On Fri, Jan 25, 2019 at 7:53 AM Maciej Szymkiewicz wrote: > > Hello everyone, > > I'd like to revisit the topic of adding PySpark type annotations in

[PySpark] Revisiting PySpark type annotations

2019-01-25 Thread Maciej Szymkiewicz
Hello everyone, I'd like to revisit the topic of adding PySpark type annotations in 3.0. It has been discussed before ( http://apache-spark-developers-list.1001551.n3.nabble.com/Python-friendly-API-for-Spark-3-0-td25016.html and