Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and TRIM(trimStr FROM str) syntax.
This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM. On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, All. > > I'm sending this email because the Apache Spark committers had better have > a consistent point of views for the upcoming PRs. And, the community policy > is the way to lead the community members transparently and clearly for a > long term good. > > First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache > Spark 3.0.0 is going to achieve many improvement in SQL areas. > > Especially, we invested lots of efforts to improve SQL ANSI compliance and > related test coverage for the future. One simple example is the following. > > [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax > > With this support, we are able to move away from one of esoteric Spark > function `TRIM/LTRIM/RTRIM`. > (Note that the above syntax is ANSI standard, but we have additional > non-standard these functions, too.) > > Here is the summary of the current status. > > +------------------------+-------------+---------------+ > | SQL Progressing Engine | TRIM Syntax | TRIM Function | > +------------------------------------------------------+ > | Spark 3.0.0-preview/2 | O | O | > | PostgreSQL | O | O | > | Presto | O | O | > | MySQL | O(*) | X | > | Oracle | O | X | > | MsSQL | O | X | > | Terradata | O | X | > | Hive | X | X | > | Spark 1.6 ~ 2.2 | X | X | > | Spark 2.3 ~ 2.4 | X | O(**) | > +------------------------+-------------+---------------+ > * MySQL doesn't allow multiple trim character > * Spark 2.3 ~ 2.4 have the function in a different way. > > Here is the illustrative example of the problem. > > postgres=# SELECT trim('yxTomxx', 'xyz'); > btrim > ------- > Tom > > presto:default> SELECT trim('yxTomxx', 'xyz'); > _col0 > ------- > Tom > > spark-sql> SELECT trim('yxTomxx', 'xyz'); > z > > Here is our history to fix the above issue. > > [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order issue > 1. https://github.com/apache/spark/pull/24902 > (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2 > released.) > 2. https://github.com/apache/spark/pull/24907 > (Merged 2019-06-20 for v2.3.4, but reverted) > 3. https://github.com/apache/spark/pull/24908 > (Merged 2019-06-21 for v2.4.4, but reverted) > > (2) and (3) were reverted before releases because we didn't want to fix > that in the maintenance releases. Please see the following references of > the decision. > > https://github.com/apache/spark/pull/24908#issuecomment-504799028 > (2.3) > https://github.com/apache/spark/pull/24907#issuecomment-504799021 > (2.4) > > Now, there are some requests to revert SPARK-28093 and to keep these > esoteric functions for backward compatibility and the following reason. > > > Reordering function parameters to match another system, > > for a method that is otherwise working correctly, > > sounds exactly like a cosmetic change to me. > > > How can we silently change the parameter of an existing SQL function? > > I don't think this is a correctness issue as the SQL standard > > doesn't say that the function signature have to be trim(srcStr, > trimStr). > > The concern and the point of views make sense. > > My concerns are the followings. > > 1. This kind of esoteric differences are called `vendor-lock-in` > stuffs in a negative way. > - It's difficult for new users to understand. > - It's hard to migrate between Apache Spark and the others. > 2. Although we did our bests, Apache Spark SQL has not been enough > always. > 3. We need to do improvement in the future releases. > > In short, we can keep 3.0.0-preview behaviors here or revert SPARK-28093 > in order to keep this vendor-lock-in stuffs for backward-compatibility. > > What I want is building a consistent point of views in this category for > the upcoming PR reviews. > > If we have a clear policy, we can save future community efforts in many > ways. > > Bests, > Dongjoon. >