Notes:

Attendees/Agenda:
Zoltan (Cloudera, file formats):
  - timestamp types
Ryan (Netflix):
  - timestamp types
  - fix for sorting metadata (min-max)
Deepak (Vertica, parquet-cpp):
  - timestamp
Emily (IBM Spark Technology center)
Greg (Cloudera):
 - timestamp
Lars (Cloudera impala):
 - min-max (https://github.com/apache/parquet-format/pull/46)
Marcel (Cl Impala):
 - timestamp
 - sorting/min max
 - bloom filters
Julien (Dremio):
 - sorting/min max
 - timestamp.

- Timestamp (2 types):
  - Floating Timestamp
    - ambiguity to the TZ: year/month/day/microseconds is the data stored.
    - timezone less
    - same binary representation as current Timestamp. Different logical
annotation.
    - how to store metadata. Same binary format w/wo.
    - action: Ryan to propose a PR on parquet-format
  - Timestamp with Timezone.
    - stored in UTC
    - client side conversion to UTC
    - writer timezone should be stored in the metadata?
  - need to clarify if time can be adjusted.
  - Int96: to be deprecated
    - int64 used instead with logical type.
    - won’t fix int96 ordering. Instead use replacement type.
    - Lars to update the JIRA (PARQUET-323)
  - new binary format : int64 storing actual date (year month day) +
microseconds since midnight.
    - Marcel to open a JIRA.
- Sorting:
  - Ryan to update the the PR (
https://github.com/apache/parquet-format/pull/46)
- Bloom filter: (PARQUET-319, PARQUET-41)
  - take analysis from original PR:
    - https://github.com/apache/parquet-mr/pull/215
    - https://github.com/apache/parquet-format/pull/28
  - need to define metadata.
- C++ code reuse between parquet-cpp, impala, …
  - impala team to discuss how they want to do that.
- store page level stats in footer (PARQUET-907)
  - several options:
    - Index Page: similar to an ISAM index. 1 per row group: if ordered
just maxes and offsets
    - add optional field in footer metadata.



On Wed, Mar 8, 2017 at 10:29 AM, Julien Le Dem <[email protected]> wrote:

> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>



-- 
Julien

Reply via email to