Thank you Julien for writing up the notes! Here is the Impala JIRA I
mentioned that tracks swapping the fields of TimestampValue: IMPALA-4825
<https://issues.cloudera.org/browse/IMPALA-4825>

A change is out for review here: https://gerrit.cloudera.org/#/c/6048/

Cheers, Lars

On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote:

>  Attendees/agenda:
> - Nandor, Zoltan (Cloudera/file formats)
> - Lars (Cloudera/Impala)" Statistics progress
> - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps
> - Wes (twosigma): parquet cpp rc. 1.0 Release
> - Julien (Dremio): parquet metadata. Statistics.
> - Deepak (HP/Vertica): Parquet-cpp
> - Kazuaki:
> - Ryan was excused :)
>
> Note:
>  - Statistics: https://github.com/apache/parquet-format/pull/46
>    - Impala is waiting for parquet-format to settle on the format to
> finalize their simple mentation.
>    - Action: Julien to follow up with Ryan on the PR
>
>  - Int96 timestamps: https://github.com/apache/parquet-format/pull/49
> (needs Ryan's feedback)
>    - format is nanosecond level timestamp from midnight (64 bits) followed
> by number of days (32 bits)
>    - it sounds like int96 ordering is different from natural byte array
> ordering because days is last in the bytes
>    - discussion about swapping bytes:
>       - format dependent on the boost library used
>       - there could be performance concerns in Impala against changing it
>       - there may be a separate project in impala to swap the bytes for
> kudu compatibility.
>    - discussion about deprecating int96:
>      - need to be able to read them always
>      - not need to define ordering if we have a clear replacement
>      - Need to clarify the requirement for alternative .
>      - int64 could be enough it sounds that nanosecond granularity might
> not be needed.
>    - Julien to create JIRAs:
>      - int96 ordering
>      - int96 deprecation, replacement.
>
> - extra timestamp logical type:
>  - floating timestamp: (not TZ stored. up to the reader to interpret TS
> based on their TZ)
>     - this would be better for following sql standard
>     - Julien to create JIRA
>  - timestamp with timezone (per SQL):
>     - each value has timezone
>     - TZ can be different for each value
>     - Julien to create JIRA
>
>  - parquet-cpp 1.0 release
>    - Uwe to update release script in master.
>    - Uwe to launch a new vote with new RC
>
>  - make impala depend on parquet-cpp
>   - duplication between parquet/impala/kudu
>   - need to measure level of overlap
>   - Wes to open JIRA for this
>   - also need an "apache commons for c++” for SQL type operations:
>      -> could be in arrow
>
>   - metadata improvements.
>    - add page level metadata in footer
>    - page skipping.
>    - Julien to open JIRA.
>
>  - add version of the writer in the footer (more precise than current).
>    - Zoltan to open Jira
>    - possibly add bitfield for bug fixes.
>
>
>
>
>
>
>
>
>
> On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <[email protected]> wrote:
>
> > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>

Reply via email to