Weren't some of these functions provided only for compatibility  and intentionally left out of the language APIs?

--
Best regards,
Maciej

On 5/25/23 23:21, Hyukjin Kwon wrote:
I don't think it'd be a release blocker .. I think we can implement them across multiple releases.

On Fri, May 26, 2023 at 1:01 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

    Thank you for the proposal.

    I'm wondering if we are going to consider them as release blockers
    or not.

    In general, I don't think those SQL functions should be available
    in all languages as release blockers.
    (Especially in R or new Spark Connect languages like Go and Rust).

    If they are not release blockers, we may allow some existing or
    future community PRs only before feature freeze (= branch cut).

    Thanks,
    Dongjoon.


    On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote:

        +1
        It is important that different APIs can be used to call the
        same function

        Ryan Berti <rbe...@netflix.com.invalid> 于2023年5月25日周四
        01:48写道:

            During my recent experience developing functions, I found
            that identifying locations (sql + connect
            functions.scala + functions.py, FunctionRegistry, +
            whatever is required for R) and standards for adding
            function signatures was not straight forward (should you
            use optional args or overload functions? which col/lit
            helpers should be used when?). Are there docs describing
            all of the locations + standards for defining a function?
            If not, that'd be great to have too.

            Ryan Berti

            Senior Data Engineer  |  Ads DE

            M 7023217573

            5808 W Sunset Blvd  |  Los Angeles, CA 90028
            
<https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail&source=g>



            On Wed, May 24, 2023 at 12:44 AM Enrico Minack
            <i...@enrico.minack.dev> wrote:

                +1

                Functions available in SQL (more general in one API)
                should be available in all APIs. I am very much in
                favor of this.

                Enrico


                Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:

                Hi all,

                I would like to discuss adding all SQL functions into
                Scala, Python and R API.
                We have SQL functions that do not exist in Scala,
                Python and R around 175.
                For example, we don’t have
                |pyspark.sql.functions.percentile| but you can invoke
                it as a SQL function, e.g., |SELECT percentile(...)|.

                The reason why we do not have all functions in the
                first place is that we want to
                only add commonly used functions, see also
                https://github.com/apache/spark/pull/21318 (which I
                agreed at that time)

                However, this has been raised multiple times over
                years, from the OSS community, dev mailing list,
                JIRAs, stackoverflow, etc.
                Seems it’s confusing about which function is
                available or not.

                Yes, we have a workaround. We can call all
                expressions by |expr("...")| or |call_udf("...",
                Columns ...)|
                But still it seems that it’s not very user-friendly
                because they expect them available under the
                functions namespace.

                Therefore, I would like to propose adding all
                expressions into all languages so that Spark is
                simpler and less confusing, e.g., which API is in
                functions or not.

                Any thoughts?



Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to