The revert looks fine to me for keeping the compatibility. Also, I think the different orders between the systems easily lead to mistakes, so , as Wenchen suggested, it looks nice to make these (two parameters) trim functions unrecommended in future releases: https://github.com/apache/spark/pull/27540#discussion_r377682518 Actually, I think the SQL TRIM syntax is enough for trim use cases...
Bests, Takeshi On Sun, Feb 16, 2020 at 3:02 AM Maxim Gekk <maxim.g...@databricks.com> wrote: > Also if we look at possible combination of trim parameters: > 1. foldable srcStr + foldable trimStr > 2. foldable srcStr + non-foldable trimStr > 3. non-foldable srcStr + foldable trimStr > 4. non-foldable srcStr + non-foldable trimStr > > The case # 2 seems a rare case, and # 3 is probably the most common case. > Once we see the second case, we could output a warning with a notice about > the order of parameters. > > Maxim Gekk > > Software Engineer > > Databricks, Inc. > > > On Sat, Feb 15, 2020 at 5:04 PM Wenchen Fan <cloud0...@gmail.com> wrote: > >> It's unfortunate that we don't have a clear document to talk about >> breaking changes (I'm working on it BTW). I believe the general guidance >> is: *avoid breaking changes unless we have to*. For example, the >> previous result was so broken that we have to fix it, moving to Scala 2.12 >> makes us have to break some APIs, etc. >> >> For this particular case, do we have to switch the parameter order? It's >> different from some systems, the order was not decided explicitly, but I >> don't think they are strong reasons. People from RDBMS should use the SQL >> standard TRIM syntax more often. People using prior Spark versions should >> have figured out the parameter order of Spark TRIM (there was no document) >> and adjusted their queries. There is no such standard that defines the >> parameter order of the TRIM function. >> >> In the long term, we would also promote the SQL standard TRIM syntax. I >> don't see much benefit of "fixing" the parameter order that worth to make a >> breaking change. >> >> Thanks, >> Wenchen >> >> >> >> On Sat, Feb 15, 2020 at 3:44 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and >>> TRIM(trimStr FROM str) syntax. >>> >>> This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM. >>> >>> On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> Hi, All. >>>> >>>> I'm sending this email because the Apache Spark committers had better >>>> have a consistent point of views for the upcoming PRs. And, the community >>>> policy is the way to lead the community members transparently and clearly >>>> for a long term good. >>>> >>>> First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache >>>> Spark 3.0.0 is going to achieve many improvement in SQL areas. >>>> >>>> Especially, we invested lots of efforts to improve SQL ANSI compliance >>>> and related test coverage for the future. One simple example is the >>>> following. >>>> >>>> [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax >>>> >>>> With this support, we are able to move away from one of esoteric Spark >>>> function `TRIM/LTRIM/RTRIM`. >>>> (Note that the above syntax is ANSI standard, but we have additional >>>> non-standard these functions, too.) >>>> >>>> Here is the summary of the current status. >>>> >>>> +------------------------+-------------+---------------+ >>>> | SQL Progressing Engine | TRIM Syntax | TRIM Function | >>>> +------------------------------------------------------+ >>>> | Spark 3.0.0-preview/2 | O | O | >>>> | PostgreSQL | O | O | >>>> | Presto | O | O | >>>> | MySQL | O(*) | X | >>>> | Oracle | O | X | >>>> | MsSQL | O | X | >>>> | Terradata | O | X | >>>> | Hive | X | X | >>>> | Spark 1.6 ~ 2.2 | X | X | >>>> | Spark 2.3 ~ 2.4 | X | O(**) | >>>> +------------------------+-------------+---------------+ >>>> * MySQL doesn't allow multiple trim character >>>> * Spark 2.3 ~ 2.4 have the function in a different way. >>>> >>>> Here is the illustrative example of the problem. >>>> >>>> postgres=# SELECT trim('yxTomxx', 'xyz'); >>>> btrim >>>> ------- >>>> Tom >>>> >>>> presto:default> SELECT trim('yxTomxx', 'xyz'); >>>> _col0 >>>> ------- >>>> Tom >>>> >>>> spark-sql> SELECT trim('yxTomxx', 'xyz'); >>>> z >>>> >>>> Here is our history to fix the above issue. >>>> >>>> [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order >>>> issue >>>> 1. https://github.com/apache/spark/pull/24902 >>>> (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2 >>>> released.) >>>> 2. https://github.com/apache/spark/pull/24907 >>>> (Merged 2019-06-20 for v2.3.4, but reverted) >>>> 3. https://github.com/apache/spark/pull/24908 >>>> (Merged 2019-06-21 for v2.4.4, but reverted) >>>> >>>> (2) and (3) were reverted before releases because we didn't want to fix >>>> that in the maintenance releases. Please see the following references of >>>> the decision. >>>> >>>> https://github.com/apache/spark/pull/24908#issuecomment-504799028 >>>> (2.3) >>>> https://github.com/apache/spark/pull/24907#issuecomment-504799021 >>>> (2.4) >>>> >>>> Now, there are some requests to revert SPARK-28093 and to keep these >>>> esoteric functions for backward compatibility and the following reason. >>>> >>>> > Reordering function parameters to match another system, >>>> > for a method that is otherwise working correctly, >>>> > sounds exactly like a cosmetic change to me. >>>> >>>> > How can we silently change the parameter of an existing SQL >>>> function? >>>> > I don't think this is a correctness issue as the SQL standard >>>> > doesn't say that the function signature have to be trim(srcStr, >>>> trimStr). >>>> >>>> The concern and the point of views make sense. >>>> >>>> My concerns are the followings. >>>> >>>> 1. This kind of esoteric differences are called `vendor-lock-in` >>>> stuffs in a negative way. >>>> - It's difficult for new users to understand. >>>> - It's hard to migrate between Apache Spark and the others. >>>> 2. Although we did our bests, Apache Spark SQL has not been enough >>>> always. >>>> 3. We need to do improvement in the future releases. >>>> >>>> In short, we can keep 3.0.0-preview behaviors here or revert >>>> SPARK-28093 in order to keep this vendor-lock-in stuffs for >>>> backward-compatibility. >>>> >>>> What I want is building a consistent point of views in this category >>>> for the upcoming PR reviews. >>>> >>>> If we have a clear policy, we can save future community efforts in many >>>> ways. >>>> >>>> Bests, >>>> Dongjoon. >>>> >>> -- --- Takeshi Yamamuro