Thanks for the reminder! I've updated the PARQUET-686 PR so it is ready for
comments. Thanks, everyone!

On Fri, Apr 14, 2017 at 3:25 PM, Julien Le Dem <[email protected]> wrote:

> Reminder:
> give feedback in:
>  -  https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BF
> xf8U_Do5K2wSO4/edit#
>  - https://github.com/apache/parquet-format/pull/51
> <https://github.com/apache/parquet-format/pull/51/files>
>  - (once updated by Ryan) https://github.com/apache/parquet-format/pull/46
>
> On Wed, Apr 12, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote:
>
> > Notes from the sync (Full room today!)
> >
> > Zoltan (Cloudera, Parquet)
> > Cheng (Databricks, Parquet - Spark integration): Index discussion
> > Ryan (Netflix): Order changes, Logical type - Timestamp
> > Deepak (Vertica - Parquet): Timestamp, indexes
> > Greg (Cloudera): Timestamp
> > Lars (Cloudera, Impala): Min/Max #46, feedback on indices
> > Marcel (Cloudera, Impala): Min/Max #46, Index pages
> > QinHui (Criteo): Migration project from JSON to Parquet using Protobuffs.
> > Problem related to this.
> > Srinath (Databricks): Indexing
> > Julien (Dremio): Min/Max, Index discussion
> >
> > Min/max: https://github.com/apache/parquet-format/pull/46
> >  - Discussed Forward compatibility requirements to have ColumnOrder as
> the
> > gatekeeper to interpret min_value and max_value fields
> >  - have the signed field is redundant and unnecessary
> >  - Action: Ryan to update the PR for final review this week (everyone).
> >
> > Index: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BF
> > xf8U_Do5K2wSO4/edit#
> >  - 2 types of lookup structures.
> >   - SortColumnIndex: index of values on sorted columns. (just boundary
> > values) (only for main sorting column)
> >      - (name should be changed as it applies even if the column is not
> > sorted)
> >   - OffsetIndex: locate data pages by row number.
> > SortColumnIndex is used to narrow down the pages to apply a filter on.
> > OffsetIndex is used to find the select rows in the other columns
> (projected
> > but not filtered on)
> > - Lars and Marcel to make sure the doc is linked in the JIRA and the JIRA
> > referred to in the title.
> > - Action for everyone: Provide feedback before April 19.
> > - After that create a PR in parquet-format (labelled experimental spec
> > until a reference implementation is finalized).
> >
> > Timestamp: https://github.com/apache/parquet-format/pull/51
> > <https://github.com/apache/parquet-format/pull/51/files>
> >  - PR #51 replaces the current LogicalType enum with a better and forward
> > compatible union based definition.
> >  - Action for everyone: Provide Feedback before April 19
> >
> >  Protobuf:
> >  - QinHui to propose JIRA/PR for saving field ids in schema for
> protobufs.
> >  - capture unknown fields for which we only know the ID
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 12, 2017 at 9:57 AM, Julien Le Dem <[email protected]>
> wrote:
> >
> >> Marcel and Lars' doc:
> >> https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8
> >> ku4BFxf8U_Do5K2wSO4/edit#heading=h.ft5dh2chrcjb
> >>
> >> On Wed, Apr 12, 2017 at 9:51 AM, Julien Le Dem <[email protected]>
> wrote:
> >>
> >>> 10am PT today on google hangout:
> >>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >>>
> >>> --
> >>> Julien
> >>>
> >>
> >>
> >>
> >> --
> >> Julien
> >>
> >
> >
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>



-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to