[
https://issues.apache.org/jira/browse/SPARK-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust updated SPARK-10519:
-------------------------------------
Target Version/s: (was: 1.6.0)
> Investigate if we should encode timezone information to a timestamp value
> stored in JSON
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-10519
> URL: https://issues.apache.org/jira/browse/SPARK-10519
> Project: Spark
> Issue Type: Task
> Components: SQL
> Reporter: Yin Huai
> Priority: Minor
>
> Since Spark 1.3, we store a timestamp in JSON without encoding the timezone
> information and the string representation of a timestamp stored in JSON
> implicitly using the local timezone (see
> [1|https://github.com/apache/spark/blob/branch-1.3/sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala#L454],
>
> [2|https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/json/JacksonGenerator.scala#L38],
>
> [3|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41],
>
> [4|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L93]).
> This behavior may cause the data consumers got different values when they
> are in a different timezone with the data producers.
> Since JSON is string based, if we encode timezone information to timestamp
> value, downstream applications may need to change their code (for example,
> java.sql.Timestamp.valueOf only supports the format of {{yyyy-\[m]m-\[d]d
> hh:mm:ss\[.f...]}}).
> We should investigate what we should do about this issue. Right now, I can
> think of three options:
> 1. Encoding timezone info in the timestamp value, which can break user code
> and may change the semantic of timestamp (our timestamp value is
> timezone-less).
> 2. When saving a timestamp value to json, we treat this value as a value in
> the local timezone and convert it to UTC time. Then, when save the data, we
> do not encode timezone info in the value.
> 3. We do not change our current behavior. But, in our doc, we explicitly say
> that users need to use a single timezone for their datasets (e.g. always use
> UTC time).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]