This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push: new e226f68 [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound e226f68 is described below commit e226f687c172c63ce9ae6531772af9df124c9454 Author: Ben Ryves <benjamin.ry...@getyourguide.com> AuthorDate: Tue Mar 31 15:16:17 2020 +0900 [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound ### What changes were proposed in this pull request? A small documentation change to clarify that the `rand()` function produces values in `[0.0, 1.0)`. ### Why are the changes needed? `rand()` uses `Rand()` - which generates values in [0, 1) ([documented here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)). The existing documentation suggests that 1.0 is a possible value returned by rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 1.0]` suggests the value returned could include 1.0). ### Does this PR introduce any user-facing change? Only documentation changes. ### How was this patch tested? Documentation changes only. Closes #28071 from Smeb/master. Authored-by: Ben Ryves <benjamin.ry...@getyourguide.com> Signed-off-by: HyukjinKwon <gurwls...@apache.org> --- R/pkg/R/functions.R | 2 +- python/pyspark/sql/functions.py | 2 +- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R index e914dd3..09b0a21 100644 --- a/R/pkg/R/functions.R +++ b/R/pkg/R/functions.R @@ -2614,7 +2614,7 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"), #' @details #' \code{rand}: Generates a random column with independent and identically distributed (i.i.d.) -#' samples from U[0.0, 1.0]. +#' samples uniformly distributed in [0.0, 1.0). #' Note: the function is non-deterministic in general case. #' #' @rdname column_nonaggregate_functions diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index b964980..c305529 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -553,7 +553,7 @@ def nanvl(col1, col2): @since(1.4) def rand(seed=None): """Generates a random column with independent and identically distributed (i.i.d.) samples - from U[0.0, 1.0]. + uniformly distributed in [0.0, 1.0). .. note:: The function is non-deterministic in general case. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index f419a38..21ad1fd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -1224,7 +1224,7 @@ object functions { /** * Generate a random column with independent and identically distributed (i.i.d.) samples - * from U[0.0, 1.0]. + * uniformly distributed in [0.0, 1.0). * * @note The function is non-deterministic in general case. * @@ -1235,7 +1235,7 @@ object functions { /** * Generate a random column with independent and identically distributed (i.i.d.) samples - * from U[0.0, 1.0]. + * uniformly distributed in [0.0, 1.0). * * @note The function is non-deterministic in general case. * --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org