+1 It is important that different APIs can be used to call the same function
Ryan Berti <rbe...@netflix.com.invalid> 于2023年5月25日周四 01:48写道: > During my recent experience developing functions, I found that identifying > locations (sql + connect functions.scala + functions.py, FunctionRegistry, > + whatever is required for R) and standards for adding function signatures > was not straight forward (should you use optional args or overload > functions? which col/lit helpers should be used when?). Are there docs > describing all of the locations + standards for defining a function? If > not, that'd be great to have too. > > Ryan Berti > > Senior Data Engineer | Ads DE > > M 7023217573 > > 5808 W Sunset Blvd | Los Angeles, CA 90028 > > > > On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev> > wrote: > >> +1 >> >> Functions available in SQL (more general in one API) should be available >> in all APIs. I am very much in favor of this. >> >> Enrico >> >> >> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon: >> >> Hi all, >> >> I would like to discuss adding all SQL functions into Scala, Python and R >> API. >> We have SQL functions that do not exist in Scala, Python and R around 175. >> For example, we don’t have pyspark.sql.functions.percentile but you can >> invoke >> it as a SQL function, e.g., SELECT percentile(...). >> >> The reason why we do not have all functions in the first place is that we >> want to >> only add commonly used functions, see also >> https://github.com/apache/spark/pull/21318 (which I agreed at that time) >> >> However, this has been raised multiple times over years, from the OSS >> community, dev mailing list, JIRAs, stackoverflow, etc. >> Seems it’s confusing about which function is available or not. >> >> Yes, we have a workaround. We can call all expressions by expr("...") or >> call_udf("...", >> Columns ...) >> But still it seems that it’s not very user-friendly because they expect >> them available under the functions namespace. >> >> Therefore, I would like to propose adding all expressions into all >> languages so that Spark is simpler and less confusing, e.g., which API is >> in functions or not. >> >> Any thoughts? >> >> >>