Hi all, Yesterday we talked about the status of the columnar encryption, and agreed that before anything related to it gets released, we need a reviewed spec. Actually Gidon already opened PR for this: https://github.com/apache/parquet-format/pull/101, it is based on the design doc (https://docs.google.com/document/d/1T89G7xR0zHFV1f2pjTO28jtfVm8qoNVGEJQ70Rsk-bY/edit) written by him. Julien, Ryan what do you think - is there anything else needed?
Regards, Nandor On Tue, Aug 28, 2018 at 7:16 PM, Julien Le Dem <julien.le...@wework.com.invalid> wrote: > Notes: > Anna (Cloudera): Bloom filter update, Iceberg > Gabor, Nandor (Cloudera): > > - Value skipping implementation to be reviewed. Move Java code from > parquet-format to parquet-mr. PR ready > - How can users of Parquet handle timestamps and TZs. Allow for writing > timestamp in java. Refactor original type logic to more flexible new > original type api. > - Column indexes and alignment of pages > - Limiting the number of records in a page to avoid skewed splits when > compression is really good. > > Ryan (Netflix): Iceberg stuff back to Parquet: expression library for push > down. Dictionary and stats based row group filtering. > JunJie (Intel): Bloom filter. Need more reviews. Have a vote on the design > and add it to parquet-format. > Julien (Wework): Encryption. > > > - Bloom Filter: > https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41 > > <https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41?filter=allopenissues> > - > - Committed utility class to parquet-cpp > - Uploaded the benchmark result. > - Ready to add into the spec. > - Submit a PR for the parquet reader spec. > - *Action*: review parquet java utility class. > https://github.com/apache/parquet-mr/pull/425 > - Encryption: > - > - Nandor, Gabor reviewing. > - Apis to allow pluggable key management. > - Need to have a proper review of the spec. > - Need more testing > - Column indices: > - > - PR to be reviewed: https://github.com/apache/parquet-mr/pull/514 > - Ryan: to review features branch > - Moving java code from parquet-format to parquet-mr: > - > - Action: review. https://github.com/apache/parquet-mr/pull/517 > - Gets the thrift file from the parquet-format released artifact. > - Maximum number of records per page: > - > - We should add a property with a maximum number of records per page > and per row group. > - Need to benchmark to figure out a good default. 10K? > - Iceberg: > - > - Some of the iceberg code should be in Parquet: > - > - Rewrote record reconstruction stack > - > - Reuses page reader and decoder > - Then does a triple iterator that return an entire column in a > file (iterator of triples) > - Record reconstruction class that handles everything that the > current one does but with {list, map} factories > - > - 20% faster to write, 5% faster to read > - Easier to write object mappers > - Helps with page level skipping. > - High level abstractions in the iceberg library: > - > - Take an expression and simplify it (not, ...) to run on > metadata > - Take a complex expression and split the part on the > partition/min/max and the remaining part. > > > > > > > On Mon, Aug 27, 2018 at 4:56 AM, Nandor Kollar <nkol...@cloudera.com.invalid >> wrote: > >> Yes, CEST. >> >> On Mon, Aug 27, 2018 at 1:01 PM, Uwe L. Korn <uw...@xhochy.com> wrote: >> > Hello Nador, >> > >> > probably I can make this time. Just a timezone question: Is it 6pm CET >> or 6pm CEST? I guess the latter. >> > >> > See http://timesched.pocoo.org/?date=2018-08-28&tz=central- >> europe-standard-time!,pacific-standard-time&range=1080,1140 >> > >> > Uwe >> > >> > On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote: >> >> Hi All, >> >> >> >> As discussed on last Parquet sync, I propose to have an other meeting >> >> on August 28th, at 6pm CET / 9 am PST to discuss those topic which we >> >> didn't have time on the sync at August 15th, and of course any new >> >> topic too. >> >> >> >> Sorry for the late notice, feel free to propose other time slot if is >> >> is not suitable for you! Calendar entry to follow. >> >> >> >> Regards, >> >> Nandor >>