Re: SQL TIMESTAMP semantics vs. SPARK-18350

Ofir Manor Wed, 24 May 2017 23:45:48 -0700

Hi Zoltan,
thanks for bringing this up, this is really important to me!
Personally, as a user developing app on top of Spark and other tools, the
current timestamp semantics has been a source of some pain - needing to
undo Spark's "auto-correcting" of timestamps .
It would be really great if we could have standard timestamp handling, like
every other SQL-compliant database and processing engine (choosing between
the two main SQL types). I was under the impression that better SQL
compliant was one of the top priorities of the Spark project.
I guess it is pretty lake in the release cycle - but it seems SPARK-18350
was just introduced a couple of weeks ago. Maybe it should be reverted to
unblock the 2.2 release, and a more proper solution could be implemented
for the next release after a more comprehensive discussion?
Just my two cents,


Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Wed, May 24, 2017 at 6:46 PM, Zoltan Ivanfi <z...@cloudera.com> wrote:

> Hi,
>
> Sorry if you receive this mail twice, it seems that my first attempt did
> not make it to the list for some reason.
>
> I would like to start a discussion about SPARK-18350
> <https://issues.apache.org/jira/browse/SPARK-18350> before it gets
> released because it seems to be going in a different direction than what
> other SQL engines of the Hadoop stack do.
>
> ANSI SQL defines the TIMESTAMP type (also known as TIMESTAMP WITHOUT TIME
> ZONE) to have timezone-agnostic semantics - basically a type that expresses
> readings from calendars and clocks and is unaffected by time zone. In the
> Hadoop stack, Impala has always worked like this and recently Presto also
> took steps <https://github.com/prestodb/presto/issues/7122> to become
> standards compliant. (Presto's design doc
> <https://docs.google.com/document/d/1UUDktZDx8fGwHZV4VyaEDQURorFbbg6ioeZ5KMHwoCk/edit>
> also contains a great summary of the different semantics.) Hive has a
> timezone-agnostic TIMESTAMP type as well (except for Parquet, a major
> source of incompatibility that is already being addressed
> <https://issues.apache.org/jira/browse/HIVE-12767>). A TIMESTAMP in
> SparkSQL, however, has UTC-normalized local time semantics (except for
> textfile), which is generally the semantics of the TIMESTAMP WITH TIME ZONE
> type.
>
> Given that timezone-agnostic TIMESTAMP semantics provide standards
> compliance and consistency with most SQL engines, I was wondering whether
> SparkSQL should also consider it in order to become ANSI SQL compliant and
> interoperable with other SQL engines of the Hadoop stack. Should SparkSQL
> adapt this semantics in the future, SPARK-18350
> <https://issues.apache.org/jira/browse/SPARK-18350> may turn out to be a
> source of problems. Please correct me if I'm wrong, but this change seems
> to explicitly assign TIMESTAMP WITH TIME ZONE semantics to the TIMESTAMP
> type. I think SPARK-18350 would be a great feature for a separate TIMESTAMP
> WITH TIME ZONE type, but the plain unqualified TIMESTAMP type would be
> better becoming timezone-agnostic instead of gaining further timezone-aware
> capabilities. (Of course becoming timezone-agnostic would be a behavior
> change, so it must be optional and configurable by the user, as in Presto.)
>
> I would like to hear your opinions about this concern and about TIMESTAMP
> semantics in general. Does the community agree that a standards-compliant
> and interoperable TIMESTAMP type is desired? Do you perceive SPARK-18350 as
> a potential problem in achieving this or do I misunderstand the effects of
> this change?
>
> Thanks,
>
> Zoltan
>
> ---
>
> List of links in case in-line links do not work:
>
>    -
>
>    SPARK-18350: https://issues.apache.org/jira/browse/SPARK-18350
>    -
>
>    Presto's change: https://github.com/prestodb/presto/issues/7122
>    -
>
>    Presto's design doc: https://docs.google.com/document/d/
>    1UUDktZDx8fGwHZV4VyaEDQURorFbbg6ioeZ5KMHwoCk/edit
>    
> <https://docs.google.com/document/d/1UUDktZDx8fGwHZV4VyaEDQURorFbbg6ioeZ5KMHwoCk/edit>
>
>
>

Re: SQL TIMESTAMP semantics vs. SPARK-18350

Reply via email to