Repository: spark
Updated Branches:
  refs/heads/branch-2.3 f87785a76 -> 3a22feab4


[SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not reduce starting 
position by 1 when calling Scala API

## What changes were proposed in this pull request?

This PR backports 
https://github.com/apache/spark/commit/24b5c69ee3feded439e5bb6390e4b63f503eeafe 
and https://github.com/apache/spark/pull/21249

There's no conflict but I opened this just to run the test and for sure.

See the discussion in https://issues.apache.org/jira/browse/SPARK-23291

## How was this patch tested?

Jenkins tests.

Author: hyukjinkwon <[email protected]>
Author: Liang-Chi Hsieh <[email protected]>

Closes #21250 from HyukjinKwon/SPARK-23291-backport.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3a22feab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3a22feab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3a22feab

Branch: refs/heads/branch-2.3
Commit: 3a22feab4dc9f0cffe3aaec692e27ab277666507
Parents: f87785a
Author: hyukjinkwon <[email protected]>
Authored: Mon May 7 14:48:28 2018 -0700
Committer: Yanbo Liang <[email protected]>
Committed: Mon May 7 14:48:28 2018 -0700

----------------------------------------------------------------------
 R/pkg/R/column.R                      | 10 ++++++++--
 R/pkg/tests/fulltests/test_sparkSQL.R |  1 +
 docs/sparkr.md                        |  4 ++++
 3 files changed, 13 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3a22feab/R/pkg/R/column.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/column.R b/R/pkg/R/column.R
index 3095adb..3d6d9f9 100644
--- a/R/pkg/R/column.R
+++ b/R/pkg/R/column.R
@@ -164,12 +164,18 @@ setMethod("alias",
 #' @aliases substr,Column-method
 #'
 #' @param x a Column.
-#' @param start starting position.
+#' @param start starting position. It should be 1-base.
 #' @param stop ending position.
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(a="abcdef")))
+#' collect(select(df, substr(df$a, 1, 4))) # the result is `abcd`.
+#' collect(select(df, substr(df$a, 2, 4))) # the result is `bcd`.
+#' }
 #' @note substr since 1.4.0
 setMethod("substr", signature(x = "Column"),
           function(x, start, stop) {
-            jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
+            jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
             column(jc)
           })
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3a22feab/R/pkg/tests/fulltests/test_sparkSQL.R
----------------------------------------------------------------------
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 5197838..bed26ec 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -1649,6 +1649,7 @@ test_that("string operators", {
   expect_false(first(select(df, startsWith(df$name, "m")))[[1]])
   expect_true(first(select(df, endsWith(df$name, "el")))[[1]])
   expect_equal(first(select(df, substr(df$name, 1, 2)))[[1]], "Mi")
+  expect_equal(first(select(df, substr(df$name, 4, 6)))[[1]], "hae")
   if (as.numeric(R.version$major) >= 3 && as.numeric(R.version$minor) >= 3) {
     expect_true(startsWith("Hello World", "Hello"))
     expect_false(endsWith("Hello World", "a"))

http://git-wip-us.apache.org/repos/asf/spark/blob/3a22feab/docs/sparkr.md
----------------------------------------------------------------------
diff --git a/docs/sparkr.md b/docs/sparkr.md
index 6685b58..73f9424 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -663,3 +663,7 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
  - The `stringsAsFactors` parameter was previously ignored with `collect`, for 
example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It has 
been corrected.
  - For `summary`, option for statistics to compute has been added. Its output 
is changed from that from `describe`.
  - A warning can be raised if versions of SparkR package and the Spark JVM do 
not match.
+
+## Upgrading to SparkR 2.3.1 and above
+
+ - In SparkR 2.3.0 and earlier, the `start` parameter of `substr` method was 
wrongly subtracted by one and considered as 0-based. This can lead to 
inconsistent substring results and also does not match with the behaviour with 
`substr` in R. In version 2.3.1 and later, it has been fixed so the `start` 
parameter of `substr` method is now 1-base. As an example, 
`substr(lit('abcdef'), 2, 4))` would result to `abc` in SparkR 2.3.0, and the 
result would be `bcd` in SparkR 2.3.1.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to