On Mon, Feb 27, 2017 at 8:47 AM, Zoltan Ivanfi <[email protected]> wrote: > Hi, > > Although the draft of SQL-92[1] does not explicitly state that the time zone > offset has to be stored, the following excerpts strongly suggest that the > time zone has to be stored with each individual value of TIMESTAMP WITH TIME > ZONE: > > The length of a TIMESTAMP is 19 positions [...] > The length of a TIMESTAMP WITH TIME ZONE is 25 positions [...] > > The draft of SQL 2003[2] standard is more specific: > > TIMESTAMP and TIME may also be specified as being WITH TIME ZONE, in which > case every value has associated with it a time zone displacement.
It is not clear to me whether that means that the timezone is *stored* with the value. In Postgres, a timestamp still has an associated timezone on input and output (but it only stores the timezone normalized to UTC). > > However, the TIMESTAMP WITH TIME ZONE type in PostgreSQL does not conform to > this definition, but behaves like the plain TIMESTAMP type of the SQL > specification. Oracle calls this type TIMESTAMP WITH LOCAL TIME instead. > > It's easier to get an overview on this chaos if we list the different > interpretations and data types: > > Timestamp stored with a time zone: > - ANSI SQL type: TIMESTAMP WITH TIME ZONE > - Oracle type: TIMESTAMP WITH TIME ZONE > - PostgeSQL type: - > > Timestamp stored without a time zone that represents a time in UTC > (automatically converted to/from local time): > - ANSI SQL type: TIMESTAMP > - Oracle type: TIMESTAMP WITH LOCAL TIME > - PostgeSQL type: TIMESTAMP WITH TIME ZONE > - Parquet type: TIMESTAMP_MILLIS > > Timestamp stored without a time zone that represents floating time (has the > same apparent value regardless of time zone, does not refer to an absolute > time instant): > - ANSI SQL type: - > - Oracle type: TIMESTAMP > - PostgeSQL type: TIMESTAMP > - Impala type: TIMESTAMP (stored as INT96 in Parquet) > > [1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt > [2] http://www.wiscorp.com/sql_2003_standard.zip > > Zoltan > > > > On Thu, Feb 23, 2017 at 10:46 PM Marcel Kornacker <[email protected]> wrote: >> >> Regarding timestamp with timezone: I'm not sure whether the SQL >> standard requires the timezone to be stored along with the timestamp >> for 'timestamp with timezone' (at least Oracle and Postgres diverge on >> that topic). >> >> Cc'ing Greg Rahn to shed some more light on that. >> >> Regarding 'make Impala depend on parquet-cpp': could someone expand on >> why we want to do this? There probably is overlap between >> Impala/Kudu/parquet-cpp, but the runtime systems of the first two have >> specific requirements (and are also different from each other), so >> trying to unify this into parquet-cpp seems difficult. >> >> On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote: >> > Attendees/agenda: >> > - Nandor, Zoltan (Cloudera/file formats) >> > - Lars (Cloudera/Impala)" Statistics progress >> > - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps >> > - Wes (twosigma): parquet cpp rc. 1.0 Release >> > - Julien (Dremio): parquet metadata. Statistics. >> > - Deepak (HP/Vertica): Parquet-cpp >> > - Kazuaki: >> > - Ryan was excused :) >> > >> > Note: >> > - Statistics: https://github.com/apache/parquet-format/pull/46 >> > - Impala is waiting for parquet-format to settle on the format to >> > finalize their simple mentation. >> > - Action: Julien to follow up with Ryan on the PR >> > >> > - Int96 timestamps: https://github.com/apache/parquet-format/pull/49 >> > (needs Ryan's feedback) >> > - format is nanosecond level timestamp from midnight (64 bits) >> > followed >> > by number of days (32 bits) >> > - it sounds like int96 ordering is different from natural byte array >> > ordering because days is last in the bytes >> > - discussion about swapping bytes: >> > - format dependent on the boost library used >> > - there could be performance concerns in Impala against changing >> > it >> > - there may be a separate project in impala to swap the bytes for >> > kudu compatibility. >> > - discussion about deprecating int96: >> > - need to be able to read them always >> > - not need to define ordering if we have a clear replacement >> > - Need to clarify the requirement for alternative . >> > - int64 could be enough it sounds that nanosecond granularity might >> > not be needed. >> > - Julien to create JIRAs: >> > - int96 ordering >> > - int96 deprecation, replacement. >> > >> > - extra timestamp logical type: >> > - floating timestamp: (not TZ stored. up to the reader to interpret TS >> > based on their TZ) >> > - this would be better for following sql standard >> > - Julien to create JIRA >> > - timestamp with timezone (per SQL): >> > - each value has timezone >> > - TZ can be different for each value >> > - Julien to create JIRA >> > >> > - parquet-cpp 1.0 release >> > - Uwe to update release script in master. >> > - Uwe to launch a new vote with new RC >> > >> > - make impala depend on parquet-cpp >> > - duplication between parquet/impala/kudu >> > - need to measure level of overlap >> > - Wes to open JIRA for this >> > - also need an "apache commons for c++” for SQL type operations: >> > -> could be in arrow >> > >> > - metadata improvements. >> > - add page level metadata in footer >> > - page skipping. >> > - Julien to open JIRA. >> > >> > - add version of the writer in the footer (more precise than current). >> > - Zoltan to open Jira >> > - possibly add bitfield for bug fixes. >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]> >> > wrote: >> > >> >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up >> >> >> >> -- >> >> Julien >> >> >> > >> > >> > >> > -- >> > Julien
