Attendees/agenda:
- Nandor, Zoltan (Cloudera/file formats)
- Lars (Cloudera/Impala)" Statistics progress
- Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps
- Wes (twosigma): parquet cpp rc. 1.0 Release
- Julien (Dremio): parquet metadata. Statistics.
- Deepak (HP/Vertica): Parquet-cpp
- Kazuaki:
- Ryan was excused :)

Note:
 - Statistics: https://github.com/apache/parquet-format/pull/46
   - Impala is waiting for parquet-format to settle on the format to
finalize their simple mentation.
   - Action: Julien to follow up with Ryan on the PR

 - Int96 timestamps: https://github.com/apache/parquet-format/pull/49
(needs Ryan's feedback)
   - format is nanosecond level timestamp from midnight (64 bits) followed
by number of days (32 bits)
   - it sounds like int96 ordering is different from natural byte array
ordering because days is last in the bytes
   - discussion about swapping bytes:
      - format dependent on the boost library used
      - there could be performance concerns in Impala against changing it
      - there may be a separate project in impala to swap the bytes for
kudu compatibility.
   - discussion about deprecating int96:
     - need to be able to read them always
     - not need to define ordering if we have a clear replacement
     - Need to clarify the requirement for alternative .
     - int64 could be enough it sounds that nanosecond granularity might
not be needed.
   - Julien to create JIRAs:
     - int96 ordering
     - int96 deprecation, replacement.

- extra timestamp logical type:
 - floating timestamp: (not TZ stored. up to the reader to interpret TS
based on their TZ)
    - this would be better for following sql standard
    - Julien to create JIRA
 - timestamp with timezone (per SQL):
    - each value has timezone
    - TZ can be different for each value
    - Julien to create JIRA

 - parquet-cpp 1.0 release
   - Uwe to update release script in master.
   - Uwe to launch a new vote with new RC

 - make impala depend on parquet-cpp
  - duplication between parquet/impala/kudu
  - need to measure level of overlap
  - Wes to open JIRA for this
  - also need an "apache commons for c++” for SQL type operations:
     -> could be in arrow

  - metadata improvements.
   - add page level metadata in footer
   - page skipping.
   - Julien to open JIRA.

 - add version of the writer in the footer (more precise than current).
   - Zoltan to open Jira
   - possibly add bitfield for bug fixes.









On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]> wrote:

> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>



-- 
Julien

Reply via email to