Thank you Julien for writing up the notes! Here is the Impala JIRA I mentioned that tracks swapping the fields of TimestampValue: IMPALA-4825 <https://issues.cloudera.org/browse/IMPALA-4825>
A change is out for review here: https://gerrit.cloudera.org/#/c/6048/ Cheers, Lars On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote: > Attendees/agenda: > - Nandor, Zoltan (Cloudera/file formats) > - Lars (Cloudera/Impala)" Statistics progress > - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps > - Wes (twosigma): parquet cpp rc. 1.0 Release > - Julien (Dremio): parquet metadata. Statistics. > - Deepak (HP/Vertica): Parquet-cpp > - Kazuaki: > - Ryan was excused :) > > Note: > - Statistics: https://github.com/apache/parquet-format/pull/46 > - Impala is waiting for parquet-format to settle on the format to > finalize their simple mentation. > - Action: Julien to follow up with Ryan on the PR > > - Int96 timestamps: https://github.com/apache/parquet-format/pull/49 > (needs Ryan's feedback) > - format is nanosecond level timestamp from midnight (64 bits) followed > by number of days (32 bits) > - it sounds like int96 ordering is different from natural byte array > ordering because days is last in the bytes > - discussion about swapping bytes: > - format dependent on the boost library used > - there could be performance concerns in Impala against changing it > - there may be a separate project in impala to swap the bytes for > kudu compatibility. > - discussion about deprecating int96: > - need to be able to read them always > - not need to define ordering if we have a clear replacement > - Need to clarify the requirement for alternative . > - int64 could be enough it sounds that nanosecond granularity might > not be needed. > - Julien to create JIRAs: > - int96 ordering > - int96 deprecation, replacement. > > - extra timestamp logical type: > - floating timestamp: (not TZ stored. up to the reader to interpret TS > based on their TZ) > - this would be better for following sql standard > - Julien to create JIRA > - timestamp with timezone (per SQL): > - each value has timezone > - TZ can be different for each value > - Julien to create JIRA > > - parquet-cpp 1.0 release > - Uwe to update release script in master. > - Uwe to launch a new vote with new RC > > - make impala depend on parquet-cpp > - duplication between parquet/impala/kudu > - need to measure level of overlap > - Wes to open JIRA for this > - also need an "apache commons for c++” for SQL type operations: > -> could be in arrow > > - metadata improvements. > - add page level metadata in footer > - page skipping. > - Julien to open JIRA. > > - add version of the writer in the footer (more precise than current). > - Zoltan to open Jira > - possibly add bitfield for bug fixes. > > > > > > > > > > On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]> wrote: > > > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > > > -- > > Julien > > > > > > -- > Julien >
