Hi,

Although the draft of SQL-92[1] does not explicitly state that the time
zone offset has to be stored, the following excerpts strongly suggest that
the time zone has to be stored with each individual value of TIMESTAMP WITH
TIME ZONE:

  The length of a TIMESTAMP is 19 positions [...]
  The length of a TIMESTAMP WITH TIME ZONE is 25 positions [...]

The draft of SQL 2003[2] standard is more specific:

  TIMESTAMP and TIME may also be specified as being WITH TIME ZONE, in
which case every value has associated with it a time zone displacement.

However, the TIMESTAMP WITH TIME ZONE type in PostgreSQL does not conform
to this definition, but behaves like the plain TIMESTAMP type of the SQL
specification. Oracle calls this type TIMESTAMP WITH LOCAL TIME instead.

It's easier to get an overview on this chaos if we list the different
interpretations and data types:

Timestamp stored with a time zone:
- ANSI SQL type: TIMESTAMP WITH TIME ZONE
- Oracle type: TIMESTAMP WITH TIME ZONE
- PostgeSQL type: -

Timestamp stored without a time zone that represents a time in UTC
(automatically converted to/from local time):
- ANSI SQL type: TIMESTAMP
- Oracle type: TIMESTAMP WITH LOCAL TIME
- PostgeSQL type: TIMESTAMP WITH TIME ZONE
- Parquet type: TIMESTAMP_MILLIS

Timestamp stored without a time zone that represents floating time (has the
same apparent value regardless of time zone, does not refer to an absolute
time instant):
- ANSI SQL type: -
- Oracle type: TIMESTAMP
- PostgeSQL type: TIMESTAMP
- Impala type: TIMESTAMP (stored as INT96 in Parquet)

[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
[2] http://www.wiscorp.com/sql_2003_standard.zip

Zoltan


On Thu, Feb 23, 2017 at 10:46 PM Marcel Kornacker <[email protected]> wrote:

> Regarding timestamp with timezone: I'm not sure whether the SQL
> standard requires the timezone to be stored along with the timestamp
> for 'timestamp with timezone' (at least Oracle and Postgres diverge on
> that topic).
>
> Cc'ing Greg Rahn to shed some more light on that.
>
> Regarding 'make Impala depend on parquet-cpp': could someone expand on
> why we want to do this? There probably is overlap between
> Impala/Kudu/parquet-cpp, but the runtime systems of the first two have
> specific requirements (and are also different from each other), so
> trying to unify this into parquet-cpp seems difficult.
>
> On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote:
> >  Attendees/agenda:
> > - Nandor, Zoltan (Cloudera/file formats)
> > - Lars (Cloudera/Impala)" Statistics progress
> > - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps
> > - Wes (twosigma): parquet cpp rc. 1.0 Release
> > - Julien (Dremio): parquet metadata. Statistics.
> > - Deepak (HP/Vertica): Parquet-cpp
> > - Kazuaki:
> > - Ryan was excused :)
> >
> > Note:
> >  - Statistics: https://github.com/apache/parquet-format/pull/46
> >    - Impala is waiting for parquet-format to settle on the format to
> > finalize their simple mentation.
> >    - Action: Julien to follow up with Ryan on the PR
> >
> >  - Int96 timestamps: https://github.com/apache/parquet-format/pull/49
> > (needs Ryan's feedback)
> >    - format is nanosecond level timestamp from midnight (64 bits)
> followed
> > by number of days (32 bits)
> >    - it sounds like int96 ordering is different from natural byte array
> > ordering because days is last in the bytes
> >    - discussion about swapping bytes:
> >       - format dependent on the boost library used
> >       - there could be performance concerns in Impala against changing it
> >       - there may be a separate project in impala to swap the bytes for
> > kudu compatibility.
> >    - discussion about deprecating int96:
> >      - need to be able to read them always
> >      - not need to define ordering if we have a clear replacement
> >      - Need to clarify the requirement for alternative .
> >      - int64 could be enough it sounds that nanosecond granularity might
> > not be needed.
> >    - Julien to create JIRAs:
> >      - int96 ordering
> >      - int96 deprecation, replacement.
> >
> > - extra timestamp logical type:
> >  - floating timestamp: (not TZ stored. up to the reader to interpret TS
> > based on their TZ)
> >     - this would be better for following sql standard
> >     - Julien to create JIRA
> >  - timestamp with timezone (per SQL):
> >     - each value has timezone
> >     - TZ can be different for each value
> >     - Julien to create JIRA
> >
> >  - parquet-cpp 1.0 release
> >    - Uwe to update release script in master.
> >    - Uwe to launch a new vote with new RC
> >
> >  - make impala depend on parquet-cpp
> >   - duplication between parquet/impala/kudu
> >   - need to measure level of overlap
> >   - Wes to open JIRA for this
> >   - also need an "apache commons for c++” for SQL type operations:
> >      -> could be in arrow
> >
> >   - metadata improvements.
> >    - add page level metadata in footer
> >    - page skipping.
> >    - Julien to open JIRA.
> >
> >  - add version of the writer in the footer (more precise than current).
> >    - Zoltan to open Jira
> >    - possibly add bitfield for bug fixes.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]>
> wrote:
> >
> >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >>
> >> --
> >> Julien
> >>
> >
> >
> >
> > --
> > Julien
>

Reply via email to