Attendees/agenda: - Nandor, Zoltan (Cloudera/file formats) - Lars (Cloudera/Impala)" Statistics progress - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps - Wes (twosigma): parquet cpp rc. 1.0 Release - Julien (Dremio): parquet metadata. Statistics. - Deepak (HP/Vertica): Parquet-cpp - Kazuaki: - Ryan was excused :)
Note: - Statistics: https://github.com/apache/parquet-format/pull/46 - Impala is waiting for parquet-format to settle on the format to finalize their simple mentation. - Action: Julien to follow up with Ryan on the PR - Int96 timestamps: https://github.com/apache/parquet-format/pull/49 (needs Ryan's feedback) - format is nanosecond level timestamp from midnight (64 bits) followed by number of days (32 bits) - it sounds like int96 ordering is different from natural byte array ordering because days is last in the bytes - discussion about swapping bytes: - format dependent on the boost library used - there could be performance concerns in Impala against changing it - there may be a separate project in impala to swap the bytes for kudu compatibility. - discussion about deprecating int96: - need to be able to read them always - not need to define ordering if we have a clear replacement - Need to clarify the requirement for alternative . - int64 could be enough it sounds that nanosecond granularity might not be needed. - Julien to create JIRAs: - int96 ordering - int96 deprecation, replacement. - extra timestamp logical type: - floating timestamp: (not TZ stored. up to the reader to interpret TS based on their TZ) - this would be better for following sql standard - Julien to create JIRA - timestamp with timezone (per SQL): - each value has timezone - TZ can be different for each value - Julien to create JIRA - parquet-cpp 1.0 release - Uwe to update release script in master. - Uwe to launch a new vote with new RC - make impala depend on parquet-cpp - duplication between parquet/impala/kudu - need to measure level of overlap - Wes to open JIRA for this - also need an "apache commons for c++” for SQL type operations: -> could be in arrow - metadata improvements. - add page level metadata in footer - page skipping. - Julien to open JIRA. - add version of the writer in the footer (more precise than current). - Zoltan to open Jira - possibly add bitfield for bug fixes. On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]> wrote: > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > -- > Julien > -- Julien
