During my recent experience developing functions, I found that identifying
locations (sql + connect functions.scala + functions.py, FunctionRegistry,
+ whatever is required for R) and standards for adding function signatures
was not straight forward (should you use optional args or overload
functions? which col/lit helpers should be used when?). Are there docs
describing all of the locations + standards for defining a function? If
not, that'd be great to have too.

Ryan Berti

Senior Data Engineer  |  Ads DE

M 7023217573

5808 W Sunset Blvd  |  Los Angeles, CA 90028



On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev>
wrote:

> +1
>
> Functions available in SQL (more general in one API) should be available
> in all APIs. I am very much in favor of this.
>
> Enrico
>
>
> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:
>
> Hi all,
>
> I would like to discuss adding all SQL functions into Scala, Python and R
> API.
> We have SQL functions that do not exist in Scala, Python and R around 175.
> For example, we don’t have pyspark.sql.functions.percentile but you can
> invoke
> it as a SQL function, e.g., SELECT percentile(...).
>
> The reason why we do not have all functions in the first place is that we
> want to
> only add commonly used functions, see also
> https://github.com/apache/spark/pull/21318 (which I agreed at that time)
>
> However, this has been raised multiple times over years, from the OSS
> community, dev mailing list, JIRAs, stackoverflow, etc.
> Seems it’s confusing about which function is available or not.
>
> Yes, we have a workaround. We can call all expressions by expr("...") or 
> call_udf("...",
> Columns ...)
> But still it seems that it’s not very user-friendly because they expect
> them available under the functions namespace.
>
> Therefore, I would like to propose adding all expressions into all
> languages so that Spark is simpler and less confusing, e.g., which API is
> in functions or not.
>
> Any thoughts?
>
>
>

Reply via email to