Thank you! On Fri, Apr 14, 2017 at 4:19 PM, Ryan Blue <[email protected]> wrote:
> Thanks for the reminder! I've updated the PARQUET-686 PR so it is ready for > comments. Thanks, everyone! > > On Fri, Apr 14, 2017 at 3:25 PM, Julien Le Dem <[email protected]> wrote: > > > Reminder: > > give feedback in: > > - https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BF > > xf8U_Do5K2wSO4/edit# > > - https://github.com/apache/parquet-format/pull/51 > > <https://github.com/apache/parquet-format/pull/51/files> > > - (once updated by Ryan) https://github.com/apache/ > parquet-format/pull/46 > > > > On Wed, Apr 12, 2017 at 11:22 AM, Julien Le Dem <[email protected]> > wrote: > > > > > Notes from the sync (Full room today!) > > > > > > Zoltan (Cloudera, Parquet) > > > Cheng (Databricks, Parquet - Spark integration): Index discussion > > > Ryan (Netflix): Order changes, Logical type - Timestamp > > > Deepak (Vertica - Parquet): Timestamp, indexes > > > Greg (Cloudera): Timestamp > > > Lars (Cloudera, Impala): Min/Max #46, feedback on indices > > > Marcel (Cloudera, Impala): Min/Max #46, Index pages > > > QinHui (Criteo): Migration project from JSON to Parquet using > Protobuffs. > > > Problem related to this. > > > Srinath (Databricks): Indexing > > > Julien (Dremio): Min/Max, Index discussion > > > > > > Min/max: https://github.com/apache/parquet-format/pull/46 > > > - Discussed Forward compatibility requirements to have ColumnOrder as > > the > > > gatekeeper to interpret min_value and max_value fields > > > - have the signed field is redundant and unnecessary > > > - Action: Ryan to update the PR for final review this week (everyone). > > > > > > Index: https://docs.google.com/document/d/ > 1sBACp8Lbutuj1Zxdowvsrlm8ku4BF > > > xf8U_Do5K2wSO4/edit# > > > - 2 types of lookup structures. > > > - SortColumnIndex: index of values on sorted columns. (just boundary > > > values) (only for main sorting column) > > > - (name should be changed as it applies even if the column is not > > > sorted) > > > - OffsetIndex: locate data pages by row number. > > > SortColumnIndex is used to narrow down the pages to apply a filter on. > > > OffsetIndex is used to find the select rows in the other columns > > (projected > > > but not filtered on) > > > - Lars and Marcel to make sure the doc is linked in the JIRA and the > JIRA > > > referred to in the title. > > > - Action for everyone: Provide feedback before April 19. > > > - After that create a PR in parquet-format (labelled experimental spec > > > until a reference implementation is finalized). > > > > > > Timestamp: https://github.com/apache/parquet-format/pull/51 > > > <https://github.com/apache/parquet-format/pull/51/files> > > > - PR #51 replaces the current LogicalType enum with a better and > forward > > > compatible union based definition. > > > - Action for everyone: Provide Feedback before April 19 > > > > > > Protobuf: > > > - QinHui to propose JIRA/PR for saving field ids in schema for > > protobufs. > > > - capture unknown fields for which we only know the ID > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 12, 2017 at 9:57 AM, Julien Le Dem <[email protected]> > > wrote: > > > > > >> Marcel and Lars' doc: > > >> https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8 > > >> ku4BFxf8U_Do5K2wSO4/edit#heading=h.ft5dh2chrcjb > > >> > > >> On Wed, Apr 12, 2017 at 9:51 AM, Julien Le Dem <[email protected]> > > wrote: > > >> > > >>> 10am PT today on google hangout: > > >>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > >>> > > >>> -- > > >>> Julien > > >>> > > >> > > >> > > >> > > >> -- > > >> Julien > > >> > > > > > > > > > > > > -- > > > Julien > > > > > > > > > > > -- > > Julien > > > > > > -- > Ryan Blue > Software Engineer > Netflix > -- Julien
