Notes from the meeting
Attendees:
Julien (WeWork): release
Hakan (Criteo): moving to parquet.
Marcel (unaffiliated)
Lars (Impala, Cloudera): new statistics min_value/max_value fields in
parquet_v2.
Gabor (Cloudera): min/max stats impl., parquet-mr.
Zoltan (Cloudera): Min/max
Anna (Cloudera): Min/Max
Uwe (BlueYonder)
Vuk Ercegovac (Cloudera)
Ryan (Netflix): getting reviews /429, parquet 2.0 reviews
Eric Owhadi (Trafodion): page level filtering. Min/max
Min_value/max_value implementation:
https://issues.apache.org/jira/browse/PARQUET-1025
<https://meet.google.com/linkredirect?authuser=1&dest=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1025>
- We should deprecate compareTo in Binary since it is at the physical
type level when ordering is a logical type notion
- We discussed a possible better implementation of compareTo that
would take the LogicalType into account but agreed this would be
a separate
effort
- Add a Comparator based on the logical type that is the preferred way
of comparing 2 values
- stats writer implementation:
- The preferred implementation is for writers to implement the new
min_value/max_value metadata field instead of old min/max
independently of
the version.
-
- Optionally writers might decide to also populate min/max for
compatibility with older tools but we should do this only if the need
arises.
- Action: provide feedback on the JIRA above (PARQUET-1025)
Ryan has two PRs for review:
- Make sure the Hadoop api does not leak through the Parquet api.
https://github.com/apache/parquet-mr/pull/429
- Improved Read allocation API:
https://github.com/apache/parquet-mr/pull/390
Action: give feedback on pull requests.
next meeting in 2 weeks. same time.
On Wed, Nov 22, 2017 at 8:57 AM, Julien Le Dem <[email protected]>
wrote:
> https://meet.google.com/udi-dvmo-sva
>