RTRIM`

Takeshi Yamamuro Sun, 16 Feb 2020 01:22:19 -0800

The revert looks fine to me for keeping the compatibility.
Also, I think the different orders between the systems easily lead to
mistakes, so
, as Wenchen suggested, it looks nice to make these (two parameters) trim
functions
unrecommended in future releases:
https://github.com/apache/spark/pull/27540#discussion_r377682518
Actually, I think the SQL TRIM syntax is enough for trim use cases...


Bests,
Takeshi


On Sun, Feb 16, 2020 at 3:02 AM Maxim Gekk <maxim.g...@databricks.com>
wrote:

> Also if we look at possible combination of trim parameters:
> 1. foldable srcStr + foldable trimStr
> 2. foldable srcStr + non-foldable trimStr
> 3. non-foldable srcStr + foldable trimStr
> 4. non-foldable srcStr + non-foldable trimStr
>
> The case # 2 seems a rare case, and # 3 is probably the most common case.
> Once we see the second case, we could output a warning with a notice about
> the order of parameters.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Sat, Feb 15, 2020 at 5:04 PM Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> It's unfortunate that we don't have a clear document to talk about
>> breaking changes (I'm working on it BTW). I believe the general guidance
>> is: *avoid breaking changes unless we have to*. For example, the
>> previous result was so broken that we have to fix it, moving to Scala 2.12
>> makes us have to break some APIs, etc.
>>
>> For this particular case, do we have to switch the parameter order? It's
>> different from some systems, the order was not decided explicitly, but I
>> don't think they are strong reasons. People from RDBMS should use the SQL
>> standard TRIM syntax more often. People using prior Spark versions should
>> have figured out the parameter order of Spark TRIM (there was no document)
>> and adjusted their queries. There is no such standard that defines the
>> parameter order of the TRIM function.
>>
>> In the long term, we would also promote the SQL standard TRIM syntax. I
>> don't see much benefit of "fixing" the parameter order that worth to make a
>> breaking change.
>>
>> Thanks,
>> Wenchen
>>
>>
>>
>> On Sat, Feb 15, 2020 at 3:44 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>> wrote:
>>
>>> Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and
>>> TRIM(trimStr FROM str) syntax.
>>>
>>> This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM.
>>>
>>> On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> Hi, All.
>>>>
>>>> I'm sending this email because the Apache Spark committers had better
>>>> have a consistent point of views for the upcoming PRs. And, the community
>>>> policy is the way to lead the community members transparently and clearly
>>>> for a long term good.
>>>>
>>>> First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache
>>>> Spark 3.0.0 is going to achieve many improvement in SQL areas.
>>>>
>>>> Especially, we invested lots of efforts to improve SQL ANSI compliance
>>>> and related test coverage for the future. One simple example is the
>>>> following.
>>>>
>>>>     [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax
>>>>
>>>> With this support, we are able to move away from one of esoteric Spark
>>>> function `TRIM/LTRIM/RTRIM`.
>>>> (Note that the above syntax is ANSI standard, but we have additional
>>>> non-standard these functions, too.)
>>>>
>>>> Here is the summary of the current status.
>>>>
>>>>     +------------------------+-------------+---------------+
>>>>     | SQL Progressing Engine | TRIM Syntax | TRIM Function |
>>>>     +------------------------------------------------------+
>>>>     | Spark 3.0.0-preview/2  |      O      |       O       |
>>>>     | PostgreSQL             |      O      |       O       |
>>>>     | Presto                 |      O      |       O       |
>>>>     | MySQL                  |      O(*)   |       X       |
>>>>     | Oracle                 |      O      |       X       |
>>>>     | MsSQL                  |      O      |       X       |
>>>>     | Terradata              |      O      |       X       |
>>>>     | Hive                   |      X      |       X       |
>>>>     | Spark 1.6 ~ 2.2        |      X      |       X       |
>>>>     | Spark 2.3 ~ 2.4        |      X      |       O(**)   |
>>>>     +------------------------+-------------+---------------+
>>>>     * MySQL doesn't allow multiple trim character
>>>>     * Spark 2.3 ~ 2.4 have the function in a different way.
>>>>
>>>> Here is the illustrative example of the problem.
>>>>
>>>>     postgres=# SELECT trim('yxTomxx', 'xyz');
>>>>     btrim
>>>>     -------
>>>>     Tom
>>>>
>>>>     presto:default> SELECT trim('yxTomxx', 'xyz');
>>>>     _col0
>>>>     -------
>>>>     Tom
>>>>
>>>>     spark-sql> SELECT trim('yxTomxx', 'xyz');
>>>>     z
>>>>
>>>> Here is our history to fix the above issue.
>>>>
>>>>     [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order
>>>> issue
>>>>     1. https://github.com/apache/spark/pull/24902
>>>>        (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2
>>>> released.)
>>>>     2. https://github.com/apache/spark/pull/24907
>>>>        (Merged 2019-06-20 for v2.3.4, but reverted)
>>>>     3. https://github.com/apache/spark/pull/24908
>>>>        (Merged 2019-06-21 for v2.4.4, but reverted)
>>>>
>>>> (2) and (3) were reverted before releases because we didn't want to fix
>>>> that in the maintenance releases. Please see the following references of
>>>> the decision.
>>>>
>>>>     https://github.com/apache/spark/pull/24908#issuecomment-504799028
>>>> (2.3)
>>>>     https://github.com/apache/spark/pull/24907#issuecomment-504799021
>>>> (2.4)
>>>>
>>>> Now, there are some requests to revert SPARK-28093 and to keep these
>>>> esoteric functions for backward compatibility and the following reason.
>>>>
>>>>     > Reordering function parameters to match another system,
>>>>     > for a method that is otherwise working correctly,
>>>>     > sounds exactly like a cosmetic change to me.
>>>>
>>>>     > How can we silently change the parameter of an existing SQL
>>>> function?
>>>>     > I don't think this is a correctness issue as the SQL standard
>>>>     > doesn't say that the function signature have to be trim(srcStr,
>>>> trimStr).
>>>>
>>>> The concern and the point of views make sense.
>>>>
>>>> My concerns are the followings.
>>>>
>>>>     1. This kind of esoteric differences are called `vendor-lock-in`
>>>> stuffs in a negative way.
>>>>        - It's difficult for new users to understand.
>>>>        - It's hard to migrate between Apache Spark and the others.
>>>>     2. Although we did our bests, Apache Spark SQL has not been enough
>>>> always.
>>>>     3. We need to do improvement in the future releases.
>>>>
>>>> In short, we can keep 3.0.0-preview behaviors here or revert
>>>> SPARK-28093 in order to keep this vendor-lock-in stuffs for
>>>> backward-compatibility.
>>>>
>>>> What I want is building a consistent point of views in this category
>>>> for the upcoming PR reviews.
>>>>
>>>> If we have a clear policy, we can save future community efforts in many
>>>> ways.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>

-- 
---
Takeshi Yamamuro

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

Reply via email to