[
https://issues.apache.org/jira/browse/SPARK-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992678#comment-14992678
]
Michael Armbrust commented on SPARK-10519:
------------------------------------------
Is this a moot point now that timestamp is tungesten encoded (and thus doesn't
have timezone information that we could write out)? You are only going to be
able to write out whatever the system default is, which may or may not be what
the timezone of the value is actually.
> Investigate if we should encode timezone information to a timestamp value
> stored in JSON
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-10519
> URL: https://issues.apache.org/jira/browse/SPARK-10519
> Project: Spark
> Issue Type: Task
> Components: SQL
> Reporter: Yin Huai
> Priority: Minor
>
> Since Spark 1.3, we store a timestamp in JSON without encoding the timezone
> information and the string representation of a timestamp stored in JSON
> implicitly using the local timezone (see
> [1|https://github.com/apache/spark/blob/branch-1.3/sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala#L454],
>
> [2|https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/json/JacksonGenerator.scala#L38],
>
> [3|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41],
>
> [4|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L93]).
> This behavior may cause the data consumers got different values when they
> are in a different timezone with the data producers.
> Since JSON is string based, if we encode timezone information to timestamp
> value, downstream applications may need to change their code (for example,
> java.sql.Timestamp.valueOf only supports the format of {{yyyy-\[m]m-\[d]d
> hh:mm:ss\[.f...]}}).
> We should investigate what we should do about this issue. Right now, I can
> think of three options:
> 1. Encoding timezone info in the timestamp value, which can break user code
> and may change the semantic of timestamp (our timestamp value is
> timezone-less).
> 2. When saving a timestamp value to json, we treat this value as a value in
> the local timezone and convert it to UTC time. Then, when save the data, we
> do not encode timezone info in the value.
> 3. We do not change our current behavior. But, in our doc, we explicitly say
> that users need to use a single timezone for their datasets (e.g. always use
> UTC time).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]