+1
Functions available in SQL (more general in one API) should be available
in all APIs. I am very much in favor of this.
Enrico
Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:
Hi all,
I would like to discuss adding all SQL functions into Scala, Python
and R API.
We have SQL functions that do not exist in Scala, Python and R around 175.
For example, we don’t have |pyspark.sql.functions.percentile| but you
can invoke
it as a SQL function, e.g., |SELECT percentile(...)|.
The reason why we do not have all functions in the first place is that
we want to
only add commonly used functions, see also
https://github.com/apache/spark/pull/21318 (which I agreed at that time)
However, this has been raised multiple times over years, from the OSS
community, dev mailing list, JIRAs, stackoverflow, etc.
Seems it’s confusing about which function is available or not.
Yes, we have a workaround. We can call all expressions by
|expr("...")| or |call_udf("...", Columns ...)|
But still it seems that it’s not very user-friendly because they
expect them available under the functions namespace.
Therefore, I would like to propose adding all expressions into all
languages so that Spark is simpler and less confusing, e.g., which API
is in functions or not.
Any thoughts?