Hi all,

Yesterday we talked about the status of the columnar encryption, and
agreed that before anything related to it gets released, we need a
reviewed spec. Actually Gidon already opened PR for this:
https://github.com/apache/parquet-format/pull/101, it is based on the
design doc 
(https://docs.google.com/document/d/1T89G7xR0zHFV1f2pjTO28jtfVm8qoNVGEJQ70Rsk-bY/edit)
written by him. Julien, Ryan what do you think - is there anything
else needed?

Regards,
Nandor

On Tue, Aug 28, 2018 at 7:16 PM, Julien Le Dem
<julien.le...@wework.com.invalid> wrote:
> Notes:
> Anna (Cloudera): Bloom filter update, Iceberg
> Gabor, Nandor (Cloudera):
>
>    - Value skipping implementation to be reviewed. Move Java code from
>    parquet-format to parquet-mr. PR ready
>    - How can users of Parquet handle timestamps and TZs. Allow for writing
>    timestamp in java. Refactor original type logic to more flexible new
>    original type api.
>    - Column indexes and alignment of pages
>    - Limiting the number of records in a page to avoid skewed splits when
>    compression is really good.
>
> Ryan (Netflix): Iceberg stuff back to Parquet: expression library for push
> down. Dictionary and stats based row group filtering.
> JunJie (Intel): Bloom filter. Need more reviews. Have a vote on the design
> and add it to parquet-format.
> Julien (Wework): Encryption.
>
>
>    - Bloom Filter:
>    https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41
>    
> <https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41?filter=allopenissues>
>    -
>       - Committed utility class to parquet-cpp
>       - Uploaded the benchmark result.
>       - Ready to add into the spec.
>       - Submit a PR for the parquet reader spec.
>       - *Action*: review parquet java utility class.
>       https://github.com/apache/parquet-mr/pull/425
>       - Encryption:
>    -
>       - Nandor, Gabor reviewing.
>       - Apis to allow pluggable key management.
>       - Need to have a proper review of the spec.
>       - Need more testing
>       - Column indices:
>    -
>       - PR to be reviewed: https://github.com/apache/parquet-mr/pull/514
>       - Ryan: to review features branch
>       - Moving java code from parquet-format to parquet-mr:
>    -
>       - Action: review. https://github.com/apache/parquet-mr/pull/517
>       - Gets the thrift file from the parquet-format released artifact.
>       - Maximum number of records per page:
>    -
>       - We should add a property with a maximum number of records per page
>       and per row group.
>       - Need to benchmark to figure out a good default. 10K?
>       - Iceberg:
>    -
>       - Some of the iceberg code should be in Parquet:
>       -
>          - Rewrote record reconstruction stack
>          -
>             - Reuses page reader and decoder
>             - Then does a triple iterator that return an entire column in a
>             file (iterator of triples)
>             - Record reconstruction class that handles everything that the
>             current one does but with {list, map} factories
>             -
>                - 20% faster to write, 5% faster to read
>                - Easier to write object mappers
>             - Helps with page level skipping.
>             - High level abstractions in the iceberg library:
>          -
>             - Take an expression and simplify it (not, ...) to run on
>             metadata
>             - Take a complex expression and split the part on the
>             partition/min/max and the remaining part.
>
>
>
>
>
>
> On Mon, Aug 27, 2018 at 4:56 AM, Nandor Kollar <nkol...@cloudera.com.invalid
>> wrote:
>
>> Yes, CEST.
>>
>> On Mon, Aug 27, 2018 at 1:01 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
>> > Hello Nador,
>> >
>> > probably I can make this time. Just a timezone question: Is it 6pm CET
>> or 6pm CEST? I guess the latter.
>> >
>> > See http://timesched.pocoo.org/?date=2018-08-28&tz=central-
>> europe-standard-time!,pacific-standard-time&range=1080,1140
>> >
>> > Uwe
>> >
>> > On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote:
>> >> Hi All,
>> >>
>> >> As discussed on last Parquet sync, I propose to have an other meeting
>> >> on August 28th, at 6pm CET / 9 am PST to discuss those topic which we
>> >> didn't have time on the sync at August 15th, and of course any new
>> >> topic too.
>> >>
>> >> Sorry for the late notice, feel free to propose other time slot if is
>> >> is not suitable for you! Calendar entry to follow.
>> >>
>> >> Regards,
>> >> Nandor
>>

Reply via email to