Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and
TRIM(trimStr FROM str) syntax.

This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM.

On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi, All.
>
> I'm sending this email because the Apache Spark committers had better have
> a consistent point of views for the upcoming PRs. And, the community policy
> is the way to lead the community members transparently and clearly for a
> long term good.
>
> First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache
> Spark 3.0.0 is going to achieve many improvement in SQL areas.
>
> Especially, we invested lots of efforts to improve SQL ANSI compliance and
> related test coverage for the future. One simple example is the following.
>
>     [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax
>
> With this support, we are able to move away from one of esoteric Spark
> function `TRIM/LTRIM/RTRIM`.
> (Note that the above syntax is ANSI standard, but we have additional
> non-standard these functions, too.)
>
> Here is the summary of the current status.
>
>     +------------------------+-------------+---------------+
>     | SQL Progressing Engine | TRIM Syntax | TRIM Function |
>     +------------------------------------------------------+
>     | Spark 3.0.0-preview/2  |      O      |       O       |
>     | PostgreSQL             |      O      |       O       |
>     | Presto                 |      O      |       O       |
>     | MySQL                  |      O(*)   |       X       |
>     | Oracle                 |      O      |       X       |
>     | MsSQL                  |      O      |       X       |
>     | Terradata              |      O      |       X       |
>     | Hive                   |      X      |       X       |
>     | Spark 1.6 ~ 2.2        |      X      |       X       |
>     | Spark 2.3 ~ 2.4        |      X      |       O(**)   |
>     +------------------------+-------------+---------------+
>     * MySQL doesn't allow multiple trim character
>     * Spark 2.3 ~ 2.4 have the function in a different way.
>
> Here is the illustrative example of the problem.
>
>     postgres=# SELECT trim('yxTomxx', 'xyz');
>     btrim
>     -------
>     Tom
>
>     presto:default> SELECT trim('yxTomxx', 'xyz');
>     _col0
>     -------
>     Tom
>
>     spark-sql> SELECT trim('yxTomxx', 'xyz');
>     z
>
> Here is our history to fix the above issue.
>
>     [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue
>     1. https://github.com/apache/spark/pull/24902
>        (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2
> released.)
>     2. https://github.com/apache/spark/pull/24907
>        (Merged 2019-06-20 for v2.3.4, but reverted)
>     3. https://github.com/apache/spark/pull/24908
>        (Merged 2019-06-21 for v2.4.4, but reverted)
>
> (2) and (3) were reverted before releases because we didn't want to fix
> that in the maintenance releases. Please see the following references of
> the decision.
>
>     https://github.com/apache/spark/pull/24908#issuecomment-504799028
> (2.3)
>     https://github.com/apache/spark/pull/24907#issuecomment-504799021
> (2.4)
>
> Now, there are some requests to revert SPARK-28093 and to keep these
> esoteric functions for backward compatibility and the following reason.
>
>     > Reordering function parameters to match another system,
>     > for a method that is otherwise working correctly,
>     > sounds exactly like a cosmetic change to me.
>
>     > How can we silently change the parameter of an existing SQL function?
>     > I don't think this is a correctness issue as the SQL standard
>     > doesn't say that the function signature have to be trim(srcStr,
> trimStr).
>
> The concern and the point of views make sense.
>
> My concerns are the followings.
>
>     1. This kind of esoteric differences are called `vendor-lock-in`
> stuffs in a negative way.
>        - It's difficult for new users to understand.
>        - It's hard to migrate between Apache Spark and the others.
>     2. Although we did our bests, Apache Spark SQL has not been enough
> always.
>     3. We need to do improvement in the future releases.
>
> In short, we can keep 3.0.0-preview behaviors here or revert SPARK-28093
> in order to keep this vendor-lock-in stuffs for backward-compatibility.
>
> What I want is building a consistent point of views in this category for
> the upcoming PR reviews.
>
> If we have a clear policy, we can save future community efforts in many
> ways.
>
> Bests,
> Dongjoon.
>

Reply via email to