Hi, All.

I'm sending this email because the Apache Spark committers had better have
a consistent point of views for the upcoming PRs. And, the community policy
is the way to lead the community members transparently and clearly for a
long term good.

First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache
Spark 3.0.0 is going to achieve many improvement in SQL areas.

Especially, we invested lots of efforts to improve SQL ANSI compliance and
related test coverage for the future. One simple example is the following.

    [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax

With this support, we are able to move away from one of esoteric Spark
function `TRIM/LTRIM/RTRIM`.
(Note that the above syntax is ANSI standard, but we have additional
non-standard these functions, too.)

Here is the summary of the current status.

    +------------------------+-------------+---------------+
    | SQL Progressing Engine | TRIM Syntax | TRIM Function |
    +------------------------------------------------------+
    | Spark 3.0.0-preview/2  |      O      |       O       |
    | PostgreSQL             |      O      |       O       |
    | Presto                 |      O      |       O       |
    | MySQL                  |      O(*)   |       X       |
    | Oracle                 |      O      |       X       |
    | MsSQL                  |      O      |       X       |
    | Terradata              |      O      |       X       |
    | Hive                   |      X      |       X       |
    | Spark 1.6 ~ 2.2        |      X      |       X       |
    | Spark 2.3 ~ 2.4        |      X      |       O(**)   |
    +------------------------+-------------+---------------+
    * MySQL doesn't allow multiple trim character
    * Spark 2.3 ~ 2.4 have the function in a different way.

Here is the illustrative example of the problem.

    postgres=# SELECT trim('yxTomxx', 'xyz');
    btrim
    -------
    Tom

    presto:default> SELECT trim('yxTomxx', 'xyz');
    _col0
    -------
    Tom

    spark-sql> SELECT trim('yxTomxx', 'xyz');
    z

Here is our history to fix the above issue.

    [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue
    1. https://github.com/apache/spark/pull/24902
       (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2
released.)
    2. https://github.com/apache/spark/pull/24907
       (Merged 2019-06-20 for v2.3.4, but reverted)
    3. https://github.com/apache/spark/pull/24908
       (Merged 2019-06-21 for v2.4.4, but reverted)

(2) and (3) were reverted before releases because we didn't want to fix
that in the maintenance releases. Please see the following references of
the decision.

    https://github.com/apache/spark/pull/24908#issuecomment-504799028 (2.3)
    https://github.com/apache/spark/pull/24907#issuecomment-504799021 (2.4)

Now, there are some requests to revert SPARK-28093 and to keep these
esoteric functions for backward compatibility and the following reason.

    > Reordering function parameters to match another system,
    > for a method that is otherwise working correctly,
    > sounds exactly like a cosmetic change to me.

    > How can we silently change the parameter of an existing SQL function?
    > I don't think this is a correctness issue as the SQL standard
    > doesn't say that the function signature have to be trim(srcStr,
trimStr).

The concern and the point of views make sense.

My concerns are the followings.

    1. This kind of esoteric differences are called `vendor-lock-in` stuffs
in a negative way.
       - It's difficult for new users to understand.
       - It's hard to migrate between Apache Spark and the others.
    2. Although we did our bests, Apache Spark SQL has not been enough
always.
    3. We need to do improvement in the future releases.

In short, we can keep 3.0.0-preview behaviors here or revert SPARK-28093 in
order to keep this vendor-lock-in stuffs for backward-compatibility.

What I want is building a consistent point of views in this category for
the upcoming PR reviews.

If we have a clear policy, we can save future community efforts in many
ways.

Bests,
Dongjoon.

Reply via email to