Thanks all. I created a JIRA at
https://issues.apache.org/jira/browse/SPARK-43907.

On Mon, 29 May 2023 at 09:12, Hyukjin Kwon <gurwls...@apache.org> wrote:

> Yes, some were cases like you mentioned.
> But I found myself explaining that reason to a lot of people, not only
> developers but users - I was asked in a conference, email, slack,
> internally and externally.
> Then realised that maybe we're doing something wrong. This is based on my
> experience so I wanted to open a discussion and see what others think about
> this :-).
>
>
>
>
> On Sat, 27 May 2023 at 00:19, Maciej <mszymkiew...@gmail.com> wrote:
>
>> Weren't some of these functions provided only for compatibility  and
>> intentionally left out of the language APIs?
>>
>> --
>> Best regards,
>> Maciej
>>
>> On 5/25/23 23:21, Hyukjin Kwon wrote:
>>
>> I don't think it'd be a release blocker .. I think we can implement them
>> across multiple releases.
>>
>> On Fri, May 26, 2023 at 1:01 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> Thank you for the proposal.
>>>
>>> I'm wondering if we are going to consider them as release blockers or
>>> not.
>>>
>>> In general, I don't think those SQL functions should be available in all
>>> languages as release blockers.
>>> (Especially in R or new Spark Connect languages like Go and Rust).
>>>
>>> If they are not release blockers, we may allow some existing or future
>>> community PRs only before feature freeze (= branch cut).
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote:
>>>
>>>> +1
>>>> It is important that different APIs can be used to call the same
>>>> function
>>>>
>>>> Ryan Berti <rbe...@netflix.com.invalid> <rbe...@netflix.com.invalid>
>>>> 于2023年5月25日周四 01:48写道:
>>>>
>>>>> During my recent experience developing functions, I found that
>>>>> identifying locations (sql + connect functions.scala + functions.py,
>>>>> FunctionRegistry, + whatever is required for R) and standards for adding
>>>>> function signatures was not straight forward (should you use optional args
>>>>> or overload functions? which col/lit helpers should be used when?). Are
>>>>> there docs describing all of the locations + standards for defining a
>>>>> function? If not, that'd be great to have too.
>>>>>
>>>>> Ryan Berti
>>>>>
>>>>> Senior Data Engineer  |  Ads DE
>>>>>
>>>>> M 7023217573
>>>>>
>>>>> 5808 W Sunset Blvd  |  Los Angeles, CA 90028
>>>>> <https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail&source=g>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev>
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Functions available in SQL (more general in one API) should be
>>>>>> available in all APIs. I am very much in favor of this.
>>>>>>
>>>>>> Enrico
>>>>>>
>>>>>>
>>>>>> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I would like to discuss adding all SQL functions into Scala, Python
>>>>>> and R API.
>>>>>> We have SQL functions that do not exist in Scala, Python and R around
>>>>>> 175.
>>>>>> For example, we don’t have pyspark.sql.functions.percentile but you
>>>>>> can invoke
>>>>>> it as a SQL function, e.g., SELECT percentile(...).
>>>>>>
>>>>>> The reason why we do not have all functions in the first place is
>>>>>> that we want to
>>>>>> only add commonly used functions, see also
>>>>>> https://github.com/apache/spark/pull/21318 (which I agreed at that
>>>>>> time)
>>>>>>
>>>>>> However, this has been raised multiple times over years, from the OSS
>>>>>> community, dev mailing list, JIRAs, stackoverflow, etc.
>>>>>> Seems it’s confusing about which function is available or not.
>>>>>>
>>>>>> Yes, we have a workaround. We can call all expressions by expr("...")
>>>>>>  or call_udf("...", Columns ...)
>>>>>> But still it seems that it’s not very user-friendly because they
>>>>>> expect them available under the functions namespace.
>>>>>>
>>>>>> Therefore, I would like to propose adding all expressions into all
>>>>>> languages so that Spark is simpler and less confusing, e.g., which API is
>>>>>> in functions or not.
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>>
>>>>>>
>>

Reply via email to