RTRIM`

Maxim Gekk Sat, 15 Feb 2020 10:03:48 -0800

Also if we look at possible combination of trim parameters:
1. foldable srcStr + foldable trimStr
2. foldable srcStr + non-foldable trimStr
3. non-foldable srcStr + foldable trimStr
4. non-foldable srcStr + non-foldable trimStr


The case # 2 seems a rare case, and # 3 is probably the most common case.
Once we see the second case, we could output a warning with a notice about
the order of parameters.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Sat, Feb 15, 2020 at 5:04 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> It's unfortunate that we don't have a clear document to talk about
> breaking changes (I'm working on it BTW). I believe the general guidance
> is: *avoid breaking changes unless we have to*. For example, the previous
> result was so broken that we have to fix it, moving to Scala 2.12 makes us
> have to break some APIs, etc.
>
> For this particular case, do we have to switch the parameter order? It's
> different from some systems, the order was not decided explicitly, but I
> don't think they are strong reasons. People from RDBMS should use the SQL
> standard TRIM syntax more often. People using prior Spark versions should
> have figured out the parameter order of Spark TRIM (there was no document)
> and adjusted their queries. There is no such standard that defines the
> parameter order of the TRIM function.
>
> In the long term, we would also promote the SQL standard TRIM syntax. I
> don't see much benefit of "fixing" the parameter order that worth to make a
> breaking change.
>
> Thanks,
> Wenchen
>
>
>
> On Sat, Feb 15, 2020 at 3:44 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and
>> TRIM(trimStr FROM str) syntax.
>>
>> This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM.
>>
>> On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> Hi, All.
>>>
>>> I'm sending this email because the Apache Spark committers had better
>>> have a consistent point of views for the upcoming PRs. And, the community
>>> policy is the way to lead the community members transparently and clearly
>>> for a long term good.
>>>
>>> First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache
>>> Spark 3.0.0 is going to achieve many improvement in SQL areas.
>>>
>>> Especially, we invested lots of efforts to improve SQL ANSI compliance
>>> and related test coverage for the future. One simple example is the
>>> following.
>>>
>>>     [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax
>>>
>>> With this support, we are able to move away from one of esoteric Spark
>>> function `TRIM/LTRIM/RTRIM`.
>>> (Note that the above syntax is ANSI standard, but we have additional
>>> non-standard these functions, too.)
>>>
>>> Here is the summary of the current status.
>>>
>>>     +------------------------+-------------+---------------+
>>>     | SQL Progressing Engine | TRIM Syntax | TRIM Function |
>>>     +------------------------------------------------------+
>>>     | Spark 3.0.0-preview/2  |      O      |       O       |
>>>     | PostgreSQL             |      O      |       O       |
>>>     | Presto                 |      O      |       O       |
>>>     | MySQL                  |      O(*)   |       X       |
>>>     | Oracle                 |      O      |       X       |
>>>     | MsSQL                  |      O      |       X       |
>>>     | Terradata              |      O      |       X       |
>>>     | Hive                   |      X      |       X       |
>>>     | Spark 1.6 ~ 2.2        |      X      |       X       |
>>>     | Spark 2.3 ~ 2.4        |      X      |       O(**)   |
>>>     +------------------------+-------------+---------------+
>>>     * MySQL doesn't allow multiple trim character
>>>     * Spark 2.3 ~ 2.4 have the function in a different way.
>>>
>>> Here is the illustrative example of the problem.
>>>
>>>     postgres=# SELECT trim('yxTomxx', 'xyz');
>>>     btrim
>>>     -------
>>>     Tom
>>>
>>>     presto:default> SELECT trim('yxTomxx', 'xyz');
>>>     _col0
>>>     -------
>>>     Tom
>>>
>>>     spark-sql> SELECT trim('yxTomxx', 'xyz');
>>>     z
>>>
>>> Here is our history to fix the above issue.
>>>
>>>     [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order
>>> issue
>>>     1. https://github.com/apache/spark/pull/24902
>>>        (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2
>>> released.)
>>>     2. https://github.com/apache/spark/pull/24907
>>>        (Merged 2019-06-20 for v2.3.4, but reverted)
>>>     3. https://github.com/apache/spark/pull/24908
>>>        (Merged 2019-06-21 for v2.4.4, but reverted)
>>>
>>> (2) and (3) were reverted before releases because we didn't want to fix
>>> that in the maintenance releases. Please see the following references of
>>> the decision.
>>>
>>>     https://github.com/apache/spark/pull/24908#issuecomment-504799028
>>> (2.3)
>>>     https://github.com/apache/spark/pull/24907#issuecomment-504799021
>>> (2.4)
>>>
>>> Now, there are some requests to revert SPARK-28093 and to keep these
>>> esoteric functions for backward compatibility and the following reason.
>>>
>>>     > Reordering function parameters to match another system,
>>>     > for a method that is otherwise working correctly,
>>>     > sounds exactly like a cosmetic change to me.
>>>
>>>     > How can we silently change the parameter of an existing SQL
>>> function?
>>>     > I don't think this is a correctness issue as the SQL standard
>>>     > doesn't say that the function signature have to be trim(srcStr,
>>> trimStr).
>>>
>>> The concern and the point of views make sense.
>>>
>>> My concerns are the followings.
>>>
>>>     1. This kind of esoteric differences are called `vendor-lock-in`
>>> stuffs in a negative way.
>>>        - It's difficult for new users to understand.
>>>        - It's hard to migrate between Apache Spark and the others.
>>>     2. Although we did our bests, Apache Spark SQL has not been enough
>>> always.
>>>     3. We need to do improvement in the future releases.
>>>
>>> In short, we can keep 3.0.0-preview behaviors here or revert SPARK-28093
>>> in order to keep this vendor-lock-in stuffs for backward-compatibility.
>>>
>>> What I want is building a consistent point of views in this category for
>>> the upcoming PR reviews.
>>>
>>> If we have a clear policy, we can save future community efforts in many
>>> ways.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

Reply via email to