Regarding timestamp with timezone: I'm not sure whether the SQL standard requires the timezone to be stored along with the timestamp for 'timestamp with timezone' (at least Oracle and Postgres diverge on that topic).
Cc'ing Greg Rahn to shed some more light on that. Regarding 'make Impala depend on parquet-cpp': could someone expand on why we want to do this? There probably is overlap between Impala/Kudu/parquet-cpp, but the runtime systems of the first two have specific requirements (and are also different from each other), so trying to unify this into parquet-cpp seems difficult. On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote: > Attendees/agenda: > - Nandor, Zoltan (Cloudera/file formats) > - Lars (Cloudera/Impala)" Statistics progress > - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps > - Wes (twosigma): parquet cpp rc. 1.0 Release > - Julien (Dremio): parquet metadata. Statistics. > - Deepak (HP/Vertica): Parquet-cpp > - Kazuaki: > - Ryan was excused :) > > Note: > - Statistics: https://github.com/apache/parquet-format/pull/46 > - Impala is waiting for parquet-format to settle on the format to > finalize their simple mentation. > - Action: Julien to follow up with Ryan on the PR > > - Int96 timestamps: https://github.com/apache/parquet-format/pull/49 > (needs Ryan's feedback) > - format is nanosecond level timestamp from midnight (64 bits) followed > by number of days (32 bits) > - it sounds like int96 ordering is different from natural byte array > ordering because days is last in the bytes > - discussion about swapping bytes: > - format dependent on the boost library used > - there could be performance concerns in Impala against changing it > - there may be a separate project in impala to swap the bytes for > kudu compatibility. > - discussion about deprecating int96: > - need to be able to read them always > - not need to define ordering if we have a clear replacement > - Need to clarify the requirement for alternative . > - int64 could be enough it sounds that nanosecond granularity might > not be needed. > - Julien to create JIRAs: > - int96 ordering > - int96 deprecation, replacement. > > - extra timestamp logical type: > - floating timestamp: (not TZ stored. up to the reader to interpret TS > based on their TZ) > - this would be better for following sql standard > - Julien to create JIRA > - timestamp with timezone (per SQL): > - each value has timezone > - TZ can be different for each value > - Julien to create JIRA > > - parquet-cpp 1.0 release > - Uwe to update release script in master. > - Uwe to launch a new vote with new RC > > - make impala depend on parquet-cpp > - duplication between parquet/impala/kudu > - need to measure level of overlap > - Wes to open JIRA for this > - also need an "apache commons for c++” for SQL type operations: > -> could be in arrow > > - metadata improvements. > - add page level metadata in footer > - page skipping. > - Julien to open JIRA. > > - add version of the writer in the footer (more precise than current). > - Zoltan to open Jira > - possibly add bitfield for bug fixes. > > > > > > > > > > On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]> wrote: > >> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up >> >> -- >> Julien >> > > > > -- > Julien
