Notes from the meeting

Attendees:
Julien (WeWork): release
Hakan (Criteo): moving to parquet.
Marcel (unaffiliated)
Lars (Impala, Cloudera): new statistics min_value/max_value fields in
parquet_v2.
Gabor (Cloudera): min/max stats impl., parquet-mr.
Zoltan (Cloudera): Min/max
Anna (Cloudera): Min/Max
Uwe (BlueYonder)
Vuk Ercegovac (Cloudera)
Ryan (Netflix): getting reviews /429, parquet 2.0 reviews
Eric Owhadi (Trafodion): page level filtering. Min/max

Min_value/max_value implementation:
     https://issues.apache.org/jira/browse/PARQUET-1025
<https://meet.google.com/linkredirect?authuser=1&dest=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1025>

   - We should deprecate compareTo in Binary since it is at the physical
   type level when ordering is a logical type notion
      - We discussed a possible better implementation of compareTo that
      would take the LogicalType into account but agreed this would be
a separate
      effort
   - Add a Comparator based on the logical type that is the preferred way
   of comparing 2 values
   - stats writer implementation:
      - The preferred implementation is for writers to implement the new
      min_value/max_value metadata field instead of old min/max
independently of
      the version.
   -
      - Optionally writers might decide to also populate min/max for
      compatibility with older tools but we should do this only if the need
      arises.
   - Action: provide feedback on the JIRA above (PARQUET-1025)

Ryan has two PRs for review:

   - Make sure the Hadoop api does not leak through the Parquet api.
   https://github.com/apache/parquet-mr/pull/429
   - Improved Read allocation API:
   https://github.com/apache/parquet-mr/pull/390

Action: give feedback on pull requests.

next meeting in 2 weeks. same time.




On Wed, Nov 22, 2017 at 8:57 AM, Julien Le Dem <[email protected]>
wrote:

> https://meet.google.com/udi-dvmo-sva
>

Reply via email to