On Mon, Feb 27, 2017 at 8:47 AM, Zoltan Ivanfi <[email protected]> wrote:
> Hi,
>
> Although the draft of SQL-92[1] does not explicitly state that the time zone
> offset has to be stored, the following excerpts strongly suggest that the
> time zone has to be stored with each individual value of TIMESTAMP WITH TIME
> ZONE:
>
>   The length of a TIMESTAMP is 19 positions [...]
>   The length of a TIMESTAMP WITH TIME ZONE is 25 positions [...]
>
> The draft of SQL 2003[2] standard is more specific:
>
>   TIMESTAMP and TIME may also be specified as being WITH TIME ZONE, in which
> case every value has associated with it a time zone displacement.

It is not clear to me whether that means that the timezone is *stored*
with the value. In Postgres, a timestamp still has an associated
timezone on input and output (but it only stores the timezone
normalized to UTC).

>
> However, the TIMESTAMP WITH TIME ZONE type in PostgreSQL does not conform to
> this definition, but behaves like the plain TIMESTAMP type of the SQL
> specification. Oracle calls this type TIMESTAMP WITH LOCAL TIME instead.
>
> It's easier to get an overview on this chaos if we list the different
> interpretations and data types:
>
> Timestamp stored with a time zone:
> - ANSI SQL type: TIMESTAMP WITH TIME ZONE
> - Oracle type: TIMESTAMP WITH TIME ZONE
> - PostgeSQL type: -
>
> Timestamp stored without a time zone that represents a time in UTC
> (automatically converted to/from local time):
> - ANSI SQL type: TIMESTAMP
> - Oracle type: TIMESTAMP WITH LOCAL TIME
> - PostgeSQL type: TIMESTAMP WITH TIME ZONE
> - Parquet type: TIMESTAMP_MILLIS
>
> Timestamp stored without a time zone that represents floating time (has the
> same apparent value regardless of time zone, does not refer to an absolute
> time instant):
> - ANSI SQL type: -
> - Oracle type: TIMESTAMP
> - PostgeSQL type: TIMESTAMP
> - Impala type: TIMESTAMP (stored as INT96 in Parquet)
>
> [1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
> [2] http://www.wiscorp.com/sql_2003_standard.zip
>
> Zoltan
>
>
>
> On Thu, Feb 23, 2017 at 10:46 PM Marcel Kornacker <[email protected]> wrote:
>>
>> Regarding timestamp with timezone: I'm not sure whether the SQL
>> standard requires the timezone to be stored along with the timestamp
>> for 'timestamp with timezone' (at least Oracle and Postgres diverge on
>> that topic).
>>
>> Cc'ing Greg Rahn to shed some more light on that.
>>
>> Regarding 'make Impala depend on parquet-cpp': could someone expand on
>> why we want to do this? There probably is overlap between
>> Impala/Kudu/parquet-cpp, but the runtime systems of the first two have
>> specific requirements (and are also different from each other), so
>> trying to unify this into parquet-cpp seems difficult.
>>
>> On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote:
>> >  Attendees/agenda:
>> > - Nandor, Zoltan (Cloudera/file formats)
>> > - Lars (Cloudera/Impala)" Statistics progress
>> > - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps
>> > - Wes (twosigma): parquet cpp rc. 1.0 Release
>> > - Julien (Dremio): parquet metadata. Statistics.
>> > - Deepak (HP/Vertica): Parquet-cpp
>> > - Kazuaki:
>> > - Ryan was excused :)
>> >
>> > Note:
>> >  - Statistics: https://github.com/apache/parquet-format/pull/46
>> >    - Impala is waiting for parquet-format to settle on the format to
>> > finalize their simple mentation.
>> >    - Action: Julien to follow up with Ryan on the PR
>> >
>> >  - Int96 timestamps: https://github.com/apache/parquet-format/pull/49
>> > (needs Ryan's feedback)
>> >    - format is nanosecond level timestamp from midnight (64 bits)
>> > followed
>> > by number of days (32 bits)
>> >    - it sounds like int96 ordering is different from natural byte array
>> > ordering because days is last in the bytes
>> >    - discussion about swapping bytes:
>> >       - format dependent on the boost library used
>> >       - there could be performance concerns in Impala against changing
>> > it
>> >       - there may be a separate project in impala to swap the bytes for
>> > kudu compatibility.
>> >    - discussion about deprecating int96:
>> >      - need to be able to read them always
>> >      - not need to define ordering if we have a clear replacement
>> >      - Need to clarify the requirement for alternative .
>> >      - int64 could be enough it sounds that nanosecond granularity might
>> > not be needed.
>> >    - Julien to create JIRAs:
>> >      - int96 ordering
>> >      - int96 deprecation, replacement.
>> >
>> > - extra timestamp logical type:
>> >  - floating timestamp: (not TZ stored. up to the reader to interpret TS
>> > based on their TZ)
>> >     - this would be better for following sql standard
>> >     - Julien to create JIRA
>> >  - timestamp with timezone (per SQL):
>> >     - each value has timezone
>> >     - TZ can be different for each value
>> >     - Julien to create JIRA
>> >
>> >  - parquet-cpp 1.0 release
>> >    - Uwe to update release script in master.
>> >    - Uwe to launch a new vote with new RC
>> >
>> >  - make impala depend on parquet-cpp
>> >   - duplication between parquet/impala/kudu
>> >   - need to measure level of overlap
>> >   - Wes to open JIRA for this
>> >   - also need an "apache commons for c++” for SQL type operations:
>> >      -> could be in arrow
>> >
>> >   - metadata improvements.
>> >    - add page level metadata in footer
>> >    - page skipping.
>> >    - Julien to open JIRA.
>> >
>> >  - add version of the writer in the footer (more precise than current).
>> >    - Zoltan to open Jira
>> >    - possibly add bitfield for bug fixes.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]>
>> > wrote:
>> >
>> >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>> >>
>> >> --
>> >> Julien
>> >>
>> >
>> >
>> >
>> > --
>> > Julien

Reply via email to