Reminder:
give feedback in:
 -  https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BF
xf8U_Do5K2wSO4/edit#
 - https://github.com/apache/parquet-format/pull/51
<https://github.com/apache/parquet-format/pull/51/files>
 - (once updated by Ryan) https://github.com/apache/parquet-format/pull/46

On Wed, Apr 12, 2017 at 11:22 AM, Julien Le Dem <[email protected]> wrote:

> Notes from the sync (Full room today!)
>
> Zoltan (Cloudera, Parquet)
> Cheng (Databricks, Parquet - Spark integration): Index discussion
> Ryan (Netflix): Order changes, Logical type - Timestamp
> Deepak (Vertica - Parquet): Timestamp, indexes
> Greg (Cloudera): Timestamp
> Lars (Cloudera, Impala): Min/Max #46, feedback on indices
> Marcel (Cloudera, Impala): Min/Max #46, Index pages
> QinHui (Criteo): Migration project from JSON to Parquet using Protobuffs.
> Problem related to this.
> Srinath (Databricks): Indexing
> Julien (Dremio): Min/Max, Index discussion
>
> Min/max: https://github.com/apache/parquet-format/pull/46
>  - Discussed Forward compatibility requirements to have ColumnOrder as the
> gatekeeper to interpret min_value and max_value fields
>  - have the signed field is redundant and unnecessary
>  - Action: Ryan to update the PR for final review this week (everyone).
>
> Index: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BF
> xf8U_Do5K2wSO4/edit#
>  - 2 types of lookup structures.
>   - SortColumnIndex: index of values on sorted columns. (just boundary
> values) (only for main sorting column)
>      - (name should be changed as it applies even if the column is not
> sorted)
>   - OffsetIndex: locate data pages by row number.
> SortColumnIndex is used to narrow down the pages to apply a filter on.
> OffsetIndex is used to find the select rows in the other columns (projected
> but not filtered on)
> - Lars and Marcel to make sure the doc is linked in the JIRA and the JIRA
> referred to in the title.
> - Action for everyone: Provide feedback before April 19.
> - After that create a PR in parquet-format (labelled experimental spec
> until a reference implementation is finalized).
>
> Timestamp: https://github.com/apache/parquet-format/pull/51
> <https://github.com/apache/parquet-format/pull/51/files>
>  - PR #51 replaces the current LogicalType enum with a better and forward
> compatible union based definition.
>  - Action for everyone: Provide Feedback before April 19
>
>  Protobuf:
>  - QinHui to propose JIRA/PR for saving field ids in schema for protobufs.
>  - capture unknown fields for which we only know the ID
>
>
>
>
>
>
> On Wed, Apr 12, 2017 at 9:57 AM, Julien Le Dem <[email protected]> wrote:
>
>> Marcel and Lars' doc:
>> https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8
>> ku4BFxf8U_Do5K2wSO4/edit#heading=h.ft5dh2chrcjb
>>
>> On Wed, Apr 12, 2017 at 9:51 AM, Julien Le Dem <[email protected]> wrote:
>>
>>> 10am PT today on google hangout:
>>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Reply via email to