-- Best regards, Maciej
On 5/25/23 23:21, Hyukjin Kwon wrote:
I don't think it'd be a release blocker .. I think we can implement them across multiple releases.On Fri, May 26, 2023 at 1:01 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:Thank you for the proposal. I'm wondering if we are going to consider them as release blockers or not. In general, I don't think those SQL functions should be available in all languages as release blockers. (Especially in R or new Spark Connect languages like Go and Rust). If they are not release blockers, we may allow some existing or future community PRs only before feature freeze (= branch cut). Thanks, Dongjoon. On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote: +1 It is important that different APIs can be used to call the same function Ryan Berti <rbe...@netflix.com.invalid> 于2023年5月25日周四 01:48写道: During my recent experience developing functions, I found that identifying locations (sql + connect functions.scala + functions.py, FunctionRegistry, + whatever is required for R) and standards for adding function signatures was not straight forward (should you use optional args or overload functions? which col/lit helpers should be used when?). Are there docs describing all of the locations + standards for defining a function? If not, that'd be great to have too. Ryan Berti Senior Data Engineer | Ads DE M 7023217573 5808 W Sunset Blvd | Los Angeles, CA 90028 <https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail&source=g> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev> wrote: +1 Functions available in SQL (more general in one API) should be available in all APIs. I am very much in favor of this. Enrico Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:Hi all, I would like to discuss adding all SQL functions into Scala, Python and R API. We have SQL functions that do not exist in Scala, Python and R around 175. For example, we don’t have |pyspark.sql.functions.percentile| but you can invoke it as a SQL function, e.g., |SELECT percentile(...)|. The reason why we do not have all functions in the first place is that we want to only add commonly used functions, see also https://github.com/apache/spark/pull/21318 (which I agreed at that time) However, this has been raised multiple times over years, from the OSS community, dev mailing list, JIRAs, stackoverflow, etc. Seems it’s confusing about which function is available or not. Yes, we have a workaround. We can call all expressions by |expr("...")| or |call_udf("...", Columns ...)| But still it seems that it’s not very user-friendly because they expect them available under the functions namespace. Therefore, I would like to propose adding all expressions into all languages so that Spark is simpler and less confusing, e.g., which API is in functions or not. Any thoughts?
OpenPGP_signature
Description: OpenPGP digital signature