Attendees/Agenda
Julien (Dremio):
 - Parquet-format: arrow types parity.
 - parquet-mr: Parquet-Arrow schema converter PR
Ryan (Netflix):
 - present New Parquet cli
 - Parquet sort order proposal
Gabor, Zoltan (Cloudera, file formats team):
 - getting started
Uwe (Blue Yonder):
 - parquet-cpp getting close to release
 - type changes with arrow discussion

Parquet logical types:
 - Julien proposed new logical types to bring parity with Arrow: Union,
Intervals types, Null, Half Precision floats
 - TODO(Julien): add LogicalType doc for new types.
 - Union:
    - differentiate between null union and projecting another value using
the union itself optional fields.
    - describe union type constraints.
 - Null: type for things that are always null. For example data coming from
schema discovery on son with a field always null.
 - Interval Type:
   - uses actual SQL spec for interval units
   - deprecate existing Interval logical type.
 - Half precision float: punt on that for now.
   - defined in Arrow metadata
   - actually not implemented in arrow-cpp and arrow-java
   - possibly add physical type for half precision types.
   - add encodings?  See Ryan’s PR for float encoding

 - Uwe: TIMESTAMP_NANOS ?
   - used in Pandas
   - used in Hive (through loosely defined Parquet’s int96)
   - debate wether we should support it or not.
   - Possibly have an int64 or fixed length byte array to store it.
   - TODO(Uwe): open a JIRA, Ryan comment

Parquet-cli:
  - Ryan's new parquet-cli
  - easier to try encodings.
  - look at data.
  - some code from the kite project in Apache 2.

Parquet sort order:
  - current proposal: to have 2 separate min and max in stats block
  - Ryan: to create a Pull Request.
  - how to formally specify sort order (comparator/collation)
  - standard database collations? Look into Calcite?

Parquet-cpp release?
  - fix bugs.
  - release JIRA.

next sync up in two weeks.

On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <[email protected]> wrote:

> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>



-- 
Julien

Reply via email to