Thanks all. I created a JIRA at https://issues.apache.org/jira/browse/SPARK-43907.
On Mon, 29 May 2023 at 09:12, Hyukjin Kwon <gurwls...@apache.org> wrote: > Yes, some were cases like you mentioned. > But I found myself explaining that reason to a lot of people, not only > developers but users - I was asked in a conference, email, slack, > internally and externally. > Then realised that maybe we're doing something wrong. This is based on my > experience so I wanted to open a discussion and see what others think about > this :-). > > > > > On Sat, 27 May 2023 at 00:19, Maciej <mszymkiew...@gmail.com> wrote: > >> Weren't some of these functions provided only for compatibility and >> intentionally left out of the language APIs? >> >> -- >> Best regards, >> Maciej >> >> On 5/25/23 23:21, Hyukjin Kwon wrote: >> >> I don't think it'd be a release blocker .. I think we can implement them >> across multiple releases. >> >> On Fri, May 26, 2023 at 1:01 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Thank you for the proposal. >>> >>> I'm wondering if we are going to consider them as release blockers or >>> not. >>> >>> In general, I don't think those SQL functions should be available in all >>> languages as release blockers. >>> (Especially in R or new Spark Connect languages like Go and Rust). >>> >>> If they are not release blockers, we may allow some existing or future >>> community PRs only before feature freeze (= branch cut). >>> >>> Thanks, >>> Dongjoon. >>> >>> >>> On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote: >>> >>>> +1 >>>> It is important that different APIs can be used to call the same >>>> function >>>> >>>> Ryan Berti <rbe...@netflix.com.invalid> <rbe...@netflix.com.invalid> >>>> 于2023年5月25日周四 01:48写道: >>>> >>>>> During my recent experience developing functions, I found that >>>>> identifying locations (sql + connect functions.scala + functions.py, >>>>> FunctionRegistry, + whatever is required for R) and standards for adding >>>>> function signatures was not straight forward (should you use optional args >>>>> or overload functions? which col/lit helpers should be used when?). Are >>>>> there docs describing all of the locations + standards for defining a >>>>> function? If not, that'd be great to have too. >>>>> >>>>> Ryan Berti >>>>> >>>>> Senior Data Engineer | Ads DE >>>>> >>>>> M 7023217573 >>>>> >>>>> 5808 W Sunset Blvd | Los Angeles, CA 90028 >>>>> <https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail&source=g> >>>>> >>>>> >>>>> >>>>> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev> >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Functions available in SQL (more general in one API) should be >>>>>> available in all APIs. I am very much in favor of this. >>>>>> >>>>>> Enrico >>>>>> >>>>>> >>>>>> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I would like to discuss adding all SQL functions into Scala, Python >>>>>> and R API. >>>>>> We have SQL functions that do not exist in Scala, Python and R around >>>>>> 175. >>>>>> For example, we don’t have pyspark.sql.functions.percentile but you >>>>>> can invoke >>>>>> it as a SQL function, e.g., SELECT percentile(...). >>>>>> >>>>>> The reason why we do not have all functions in the first place is >>>>>> that we want to >>>>>> only add commonly used functions, see also >>>>>> https://github.com/apache/spark/pull/21318 (which I agreed at that >>>>>> time) >>>>>> >>>>>> However, this has been raised multiple times over years, from the OSS >>>>>> community, dev mailing list, JIRAs, stackoverflow, etc. >>>>>> Seems it’s confusing about which function is available or not. >>>>>> >>>>>> Yes, we have a workaround. We can call all expressions by expr("...") >>>>>> or call_udf("...", Columns ...) >>>>>> But still it seems that it’s not very user-friendly because they >>>>>> expect them available under the functions namespace. >>>>>> >>>>>> Therefore, I would like to propose adding all expressions into all >>>>>> languages so that Spark is simpler and less confusing, e.g., which API is >>>>>> in functions or not. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> >>>>>> >>