Repository: spark Updated Branches: refs/heads/branch-2.4 0cf4c5bbe -> 0b4e58187
[SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp ## What changes were proposed in this pull request? We have an agreement that the behavior of `from/to_utc_timestamp` is corrected, although the function itself doesn't make much sense in Spark: https://issues.apache.org/jira/browse/SPARK-23715 This PR improves the document. ## How was this patch tested? N/A Closes #22543 from cloud-fan/doc. Authored-by: Wenchen Fan <wenc...@databricks.com> Signed-off-by: Wenchen Fan <wenc...@databricks.com> (cherry picked from commit ff876137faba1802b66ecd483ba15f6ccd83ffc5) Signed-off-by: Wenchen Fan <wenc...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0b4e5818 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0b4e5818 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0b4e5818 Branch: refs/heads/branch-2.4 Commit: 0b4e58187b787cc7a6d57a2a9d467934ece24252 Parents: 0cf4c5b Author: Wenchen Fan <wenc...@databricks.com> Authored: Thu Sep 27 15:02:20 2018 +0800 Committer: Wenchen Fan <wenc...@databricks.com> Committed: Thu Sep 27 15:02:52 2018 +0800 ---------------------------------------------------------------------- R/pkg/R/functions.R | 26 +++++++++++++---- python/pyspark/sql/functions.py | 30 ++++++++++++++++---- .../expressions/datetimeExpressions.scala | 30 ++++++++++++++++---- 3 files changed, 68 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/0b4e5818/R/pkg/R/functions.R ---------------------------------------------------------------------- diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R index 572dee5..63bd427 100644 --- a/R/pkg/R/functions.R +++ b/R/pkg/R/functions.R @@ -2203,9 +2203,16 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType") }) #' @details -#' \code{from_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a -#' time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' -#' would yield '2017-07-14 03:40:00.0'. +#' \code{from_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT +#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a +#' timestamp in UTC, and renders that timestamp as a timestamp in the given time zone. +#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not +#' timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to +#' the given timezone. +#' This function may return confusing result if the input is a string with timezone, e.g. +#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to +#' timestamp according to the timezone in the string, and finally display the result by converting +#' the timestamp to string according to the session local timezone. #' #' @rdname column_datetime_diff_functions #' @@ -2261,9 +2268,16 @@ setMethod("next_day", signature(y = "Column", x = "character"), }) #' @details -#' \code{to_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a -#' time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' -#' would yield '2017-07-14 01:40:00.0'. +#' \code{to_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT +#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a +#' timestamp in the given timezone, and renders that timestamp as a timestamp in UTC. +#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not +#' timezone-agnostic. So in Spark this function just shift the timestamp value from the given +#' timezone to UTC timezone. +#' This function may return confusing result if the input is a string with timezone, e.g. +#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to +#' timestamp according to the timezone in the string, and finally display the result by converting +#' the timestamp to string according to the session local timezone. #' #' @rdname column_datetime_diff_functions #' @aliases to_utc_timestamp to_utc_timestamp,Column,character-method http://git-wip-us.apache.org/repos/asf/spark/blob/0b4e5818/python/pyspark/sql/functions.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 6da5237..8c54179 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -1283,9 +1283,18 @@ def unix_timestamp(timestamp=None, format='yyyy-MM-dd HH:mm:ss'): @since(1.5) def from_utc_timestamp(timestamp, tz): """ - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders - that time as a timestamp in the given time zone. For example, 'GMT+1' would yield - '2017-07-14 03:40:00.0'. + This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function + takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in UTC, and + renders that timestamp as a timestamp in the given time zone. + + However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not + timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to + the given timezone. + + This function may return confusing result if the input is a string with timezone, e.g. + '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp + according to the timezone in the string, and finally display the result by converting the + timestamp to string according to the session local timezone. :param timestamp: the column that contains timestamps :param tz: a string that has the ID of timezone, e.g. "GMT", "America/Los_Angeles", etc @@ -1308,9 +1317,18 @@ def from_utc_timestamp(timestamp, tz): @since(1.5) def to_utc_timestamp(timestamp, tz): """ - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time - zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield - '2017-07-14 01:40:00.0'. + This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function + takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in the given + timezone, and renders that timestamp as a timestamp in UTC. + + However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not + timezone-agnostic. So in Spark this function just shift the timestamp value from the given + timezone to UTC timezone. + + This function may return confusing result if the input is a string with timezone, e.g. + '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp + according to the timezone in the string, and finally display the result by converting the + timestamp to string according to the session local timezone. :param timestamp: the column that contains timestamps :param tz: a string that has the ID of timezone, e.g. "GMT", "America/Los_Angeles", etc http://git-wip-us.apache.org/repos/asf/spark/blob/0b4e5818/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index eb78e39..45e17ae 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -1018,9 +1018,18 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S } /** - * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders - * that time as a timestamp in the given time zone. For example, 'GMT+1' would yield - * '2017-07-14 03:40:00.0'. + * This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function + * takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in UTC, and + * renders that timestamp as a timestamp in the given time zone. + * + * However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not + * timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to + * the given timezone. + * + * This function may return confusing result if the input is a string with timezone, e.g. + * '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp + * according to the timezone in the string, and finally display the result by converting the + * timestamp to string according to the session local timezone. */ // scalastyle:off line.size.limit @ExpressionDescription( @@ -1215,9 +1224,18 @@ case class MonthsBetween( } /** - * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, - * and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield - * '2017-07-14 01:40:00.0'. + * This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE. This function + * takes a timestamp which is timezone-agnostic, and interprets it as a timestamp in the given + * timezone, and renders that timestamp as a timestamp in UTC. + * + * However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not + * timezone-agnostic. So in Spark this function just shift the timestamp value from the given + * timezone to UTC timezone. + * + * This function may return confusing result if the input is a string with timezone, e.g. + * '2018-03-13T06:18:23+00:00'. The reason is that, Spark firstly cast the string to timestamp + * according to the timezone in the string, and finally display the result by converting the + * timestamp to string according to the session local timezone. */ // scalastyle:off line.size.limit @ExpressionDescription( --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org