+1

Functions available in SQL (more general in one API) should be available in all APIs. I am very much in favor of this.

Enrico


Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:

Hi all,

I would like to discuss adding all SQL functions into Scala, Python and R API.
We have SQL functions that do not exist in Scala, Python and R around 175.
For example, we don’t have |pyspark.sql.functions.percentile| but you can invoke
it as a SQL function, e.g., |SELECT percentile(...)|.

The reason why we do not have all functions in the first place is that we want to only add commonly used functions, see also https://github.com/apache/spark/pull/21318 (which I agreed at that time)

However, this has been raised multiple times over years, from the OSS community, dev mailing list, JIRAs, stackoverflow, etc.
Seems it’s confusing about which function is available or not.

Yes, we have a workaround. We can call all expressions by |expr("...")| or |call_udf("...", Columns ...)| But still it seems that it’s not very user-friendly because they expect them available under the functions namespace.

Therefore, I would like to propose adding all expressions into all languages so that Spark is simpler and less confusing, e.g., which API is in functions or not.

Any thoughts?

Reply via email to