Also if we look at possible combination of trim parameters: 1. foldable srcStr + foldable trimStr 2. foldable srcStr + non-foldable trimStr 3. non-foldable srcStr + foldable trimStr 4. non-foldable srcStr + non-foldable trimStr
The case # 2 seems a rare case, and # 3 is probably the most common case. Once we see the second case, we could output a warning with a notice about the order of parameters. Maxim Gekk Software Engineer Databricks, Inc. On Sat, Feb 15, 2020 at 5:04 PM Wenchen Fan <cloud0...@gmail.com> wrote: > It's unfortunate that we don't have a clear document to talk about > breaking changes (I'm working on it BTW). I believe the general guidance > is: *avoid breaking changes unless we have to*. For example, the previous > result was so broken that we have to fix it, moving to Scala 2.12 makes us > have to break some APIs, etc. > > For this particular case, do we have to switch the parameter order? It's > different from some systems, the order was not decided explicitly, but I > don't think they are strong reasons. People from RDBMS should use the SQL > standard TRIM syntax more often. People using prior Spark versions should > have figured out the parameter order of Spark TRIM (there was no document) > and adjusted their queries. There is no such standard that defines the > parameter order of the TRIM function. > > In the long term, we would also promote the SQL standard TRIM syntax. I > don't see much benefit of "fixing" the parameter order that worth to make a > breaking change. > > Thanks, > Wenchen > > > > On Sat, Feb 15, 2020 at 3:44 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and >> TRIM(trimStr FROM str) syntax. >> >> This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM. >> >> On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, All. >>> >>> I'm sending this email because the Apache Spark committers had better >>> have a consistent point of views for the upcoming PRs. And, the community >>> policy is the way to lead the community members transparently and clearly >>> for a long term good. >>> >>> First of all, I want to emphasize that, like Apache Spark 2.0.0, Apache >>> Spark 3.0.0 is going to achieve many improvement in SQL areas. >>> >>> Especially, we invested lots of efforts to improve SQL ANSI compliance >>> and related test coverage for the future. One simple example is the >>> following. >>> >>> [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax >>> >>> With this support, we are able to move away from one of esoteric Spark >>> function `TRIM/LTRIM/RTRIM`. >>> (Note that the above syntax is ANSI standard, but we have additional >>> non-standard these functions, too.) >>> >>> Here is the summary of the current status. >>> >>> +------------------------+-------------+---------------+ >>> | SQL Progressing Engine | TRIM Syntax | TRIM Function | >>> +------------------------------------------------------+ >>> | Spark 3.0.0-preview/2 | O | O | >>> | PostgreSQL | O | O | >>> | Presto | O | O | >>> | MySQL | O(*) | X | >>> | Oracle | O | X | >>> | MsSQL | O | X | >>> | Terradata | O | X | >>> | Hive | X | X | >>> | Spark 1.6 ~ 2.2 | X | X | >>> | Spark 2.3 ~ 2.4 | X | O(**) | >>> +------------------------+-------------+---------------+ >>> * MySQL doesn't allow multiple trim character >>> * Spark 2.3 ~ 2.4 have the function in a different way. >>> >>> Here is the illustrative example of the problem. >>> >>> postgres=# SELECT trim('yxTomxx', 'xyz'); >>> btrim >>> ------- >>> Tom >>> >>> presto:default> SELECT trim('yxTomxx', 'xyz'); >>> _col0 >>> ------- >>> Tom >>> >>> spark-sql> SELECT trim('yxTomxx', 'xyz'); >>> z >>> >>> Here is our history to fix the above issue. >>> >>> [SPARK-28093][SQL] Fix TRIM/LTRIM/RTRIM function parameter order >>> issue >>> 1. https://github.com/apache/spark/pull/24902 >>> (Merged 2019-06-18 for v3.0.0, 3.0.0-preview and 3.0.0-preview2 >>> released.) >>> 2. https://github.com/apache/spark/pull/24907 >>> (Merged 2019-06-20 for v2.3.4, but reverted) >>> 3. https://github.com/apache/spark/pull/24908 >>> (Merged 2019-06-21 for v2.4.4, but reverted) >>> >>> (2) and (3) were reverted before releases because we didn't want to fix >>> that in the maintenance releases. Please see the following references of >>> the decision. >>> >>> https://github.com/apache/spark/pull/24908#issuecomment-504799028 >>> (2.3) >>> https://github.com/apache/spark/pull/24907#issuecomment-504799021 >>> (2.4) >>> >>> Now, there are some requests to revert SPARK-28093 and to keep these >>> esoteric functions for backward compatibility and the following reason. >>> >>> > Reordering function parameters to match another system, >>> > for a method that is otherwise working correctly, >>> > sounds exactly like a cosmetic change to me. >>> >>> > How can we silently change the parameter of an existing SQL >>> function? >>> > I don't think this is a correctness issue as the SQL standard >>> > doesn't say that the function signature have to be trim(srcStr, >>> trimStr). >>> >>> The concern and the point of views make sense. >>> >>> My concerns are the followings. >>> >>> 1. This kind of esoteric differences are called `vendor-lock-in` >>> stuffs in a negative way. >>> - It's difficult for new users to understand. >>> - It's hard to migrate between Apache Spark and the others. >>> 2. Although we did our bests, Apache Spark SQL has not been enough >>> always. >>> 3. We need to do improvement in the future releases. >>> >>> In short, we can keep 3.0.0-preview behaviors here or revert SPARK-28093 >>> in order to keep this vendor-lock-in stuffs for backward-compatibility. >>> >>> What I want is building a consistent point of views in this category for >>> the upcoming PR reviews. >>> >>> If we have a clear policy, we can save future community efforts in many >>> ways. >>> >>> Bests, >>> Dongjoon. >>> >>