[
https://issues.apache.org/jira/browse/SPARK-35662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-35662:
-----------------------------------
Affects Version/s: (was: 3.2.0)
3.3.0
> Support Timestamp without time zone data type
> ---------------------------------------------
>
> Key: SPARK-35662
> URL: https://issues.apache.org/jira/browse/SPARK-35662
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Gengliang Wang
> Assignee: Apache Spark
> Priority: Major
>
> Spark SQL today supports the TIMESTAMP data type. However the semantics
> provided actually match TIMESTAMP WITH LOCAL TIMEZONE as defined by Oracle.
> Timestamps embedded in a SQL query or passed through JDBC are presumed to be
> in session local timezone and cast to UTC before being processed.
> These are desirable semantics in many cases, such as when dealing with
> calendars.
> In many (more) other cases, such as when dealing with log files it is
> desirable that the provided timestamps not be altered.
> SQL users expect that they can model either behavior and do so by using
> TIMESTAMP WITHOUT TIME ZONE for time zone insensitive data and TIMESTAMP WITH
> LOCAL TIME ZONE for time zone sensitive data.
> Most traditional RDBMS map TIMESTAMP to TIMESTAMP WITHOUT TIME ZONE and will
> be surprised to see TIMESTAMP WITH LOCAL TIME ZONE, a feature that does not
> exist in the standard.
> In this new feature, we will introduce TIMESTAMP WITH LOCAL TIMEZONE to
> describe the existing timestamp type and add TIMESTAMP WITHOUT TIME ZONE for
> standard semantic.
> Using these two types will provide clarity.
> We will also allow users to set the default behavior for TIMESTAMP to either
> use TIMESTAMP WITH LOCAL TIME ZONE or TIMESTAMP WITHOUT TIME ZONE.
> h3. Milestone 1 – Spark Timestamp equivalency ( The new Timestamp type
> TimestampWithoutTZ meets or exceeds all function of the existing SQL
> Timestamp):
> * Add a new DataType implementation for TimestampWithoutTZ.
> * Support TimestampWithoutTZ in Dataset/UDF.
> * TimestampWithoutTZ literals
> * TimestampWithoutTZ arithmetic(e.g. TimestampWithoutTZ -
> TimestampWithoutTZ, TimestampWithoutTZ - Date)
> * Datetime functions/operators: dayofweek, weekofyear, year, etc
> * Cast to and from TimestampWithoutTZ, cast String/Timestamp to
> TimestampWithoutTZ, cast TimestampWithoutTZ to string (pretty
> printing)/Timestamp, with the SQL syntax to specify the types
> * Support sorting TimestampWithoutTZ.
> h3. Milestone 2 – Persistence:
> * Ability to create tables of type TimestampWithoutTZ
> * Ability to write to common file formats such as Parquet and JSON.
> * INSERT, SELECT, UPDATE, MERGE
> * Discovery
> h3. Milestone 3 – Client support
> * JDBC support
> * Hive Thrift server
> h3. Milestone 4 – PySpark and Spark R integration
> * Python UDF can take and return TimestampWithoutTZ
> * DataFrame support
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]