Notes: Attendees/agenda building: Zoltan (Cloudera): - timestamp, min/max Anna (cloudera) Deepak (Vertica): - timestamp - c++/java: bloom filter. Lars (Cloudera Impala) - page skipping indexes - open PRs Pooja (Cloudera Impala): - page skipping indexes Julien (Dremio): - page skipping indexes - timestamp
Agenda: - open PRs TODO (all): review: - https://github.com/apache/parquet-format/pull/54 - https://github.com/apache/parquet-mr/pull/414 - https://github.com/apache/parquet-mr/pull/411 - https://github.com/apache/parquet-mr/pull/413 - https://github.com/apache/parquet-mr/pull/410 TODO: follow up (Julien, Lars, Ryan): https://github.com/ apache/parquet-format/pull/53 Ryan follow up https://github.com/apache/parquet-format/pull/51 Julien more tests: https://github.com/apache/parquet-format/pull/50 Ryan follow up: https://github.com/apache/parquet-format/pull/49 - PR triage: - TODO: Lars to do a pass on parquet-format - TODO: Julien to do a pass on parquet-mr - timestamps: - When reading from parquet to arrow if the timestamp isAdjusted to UTC in arrow we use UTC timezone. otherwise no timezone (timestamp without timezone) - follow up on jira about timestamp with timezone: PARQUET-906 - min/max: PARQUET-686 - final conclusion: https://github.com/apache/parquet-format/pull/46 - PARQUET-839 => duplicate of PARQUET-686 - TODO close obsolete PRs: - <https://github.com/apache/parquet-format/pull/42> https://github.com/apache/parquet-format/pull/42 - https://github.com/apache/parquet-mr/pull/362 - We need an implementation in parquet-mr for the metadata in https://github.com/apache/parquet-format/pull/46 - TODO: Zoltan to open a jira - impala has an implementation, we should test they are compatible - bloom filter - PARQUET-319: see linked PR and doc. - https://github.com/apache/parquet-format/pull/28 - https://docs.google.com/document/d/1mIZ0W24Cr79QHJWN1sQ3dIUc4lAK5 AVqozwSwtpFhW8/edit#heading=h.hmt1hrab3fpc - TODO: review and give feedback - page skipping indexes - plan is prototype a writer in impala then a reader. - We’ll review the results to finalize the metadata in 5-6 weeks. - dealing with statistics coming from parquet-cpp - new min/max_value fields will be the reference On Wed, Jun 7, 2017 at 10:54 AM, Wes McKinney <[email protected]> wrote: > Sorry, I was unable to join the sync today. I'm interested to discuss > more my comments on > > https://github.com/apache/parquet-format/pull/51#discussion_r119911623 > > I'll wait for the notes from the call and maybe we can continue the > discussion on GitHub > > On Wed, Jun 7, 2017 at 12:53 PM, Julien Le Dem <[email protected]> wrote: > > 10am PT on google hangout: > > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > > > Reminder that this is open to all. > > Here is how it goes: > > - we do a "round table" of people present where they quickly introduce > > themselves and state the topics they wish discussed (if any. Being a "fly > > on the wall" is totally fine too) > > - based on that first round we summarize the agenda and go over the > topics > > one by one. (can be just bringing attention of people to a PR that needs > a > > review or asking if it makes sense to implement some new feature) > > - In the end we send notes back to the list and follow ups happen on > JIRA, > > github PRs and the dev list. > > - if the time is inconvenient to you say so on the list and we can > figure > > out something. > > > > -- > > Julien > -- Julien
