<re-sending from another account> Tham, thank you for this! and for volunteering early for the C++ version work, driving it forward and creating the bulk of the parquet-cpp encryption code along the way.
@All - this announcement means that two implementations of Parquet encryption, fully conforming to the formal specification, are available today. Thanks to Revital for contributing to C++ version compliance with the encryption spec, and for running a set of basic Java-C++ encryption interoperability tests. We have tested plaintext and encrypted footer modes, GCM and GCM_CTR algorithms, new and legacy readers. Files written with parquet-cpp are successfully parsed by parquet-mr, and vice versa. Let me also thank Junjie, Nandor, Anna and Xinli for their support and vote for the encryption specification - along with the PMC folks. All parquet-format pull requests are merged by now into the encryption branch, https://github.com/apache/parquet-format/tree/encryption The community is welcome to review the parquet-mr pull requests, in the following order: https://github.com/apache/parquet-mr/pull/613 https://github.com/apache/parquet-mr/pull/614 https://github.com/apache/parquet-mr/pull/643 Currently, an end-to-end implementation of Java (mr) Parquet encryption is collected in this branch: https://github.com/ggershinsky/parquet-mr/tree/encr Thanks to Xinli for working with this branch code, and contributing to it based on his field experience. Everybody is welcome to do the same. @All - it would be helpful to review & merge the above PRs in apache/parquet-mr/encryption, so that folks can work with it instead of my private branch.. And I certainly second Tham's call to review & merge the parquet-cpp pull requests. By now, we have a number of companies starting to utilize Parquet encryption (both C++ and Java), including IBM. Cheers, Gidon. On Mon, May 20, 2019 at 1:40 PM Tham Ha <[email protected]> wrote: > Hi community, > > > > After a long time of development, I'm honor to announce that we have just > completed C++ parquet encryption module which implements encryption in low > level api and with examples included. > > > > To have this feature completed, I would like to thank Gidon and Revital for > their contribution. > > Gidon had a key role in encryption design and in writing Java version code > on which we based on to write C++ version. He also wrote crypto package in > C++ version. > > Revital and me has been joining together in writing C++ version. Revital > was responsible for AAD calculations, API updating (to be the same with > Java version) and Java-C++ inter-operation testing. I was writing the first > draft (properties, metadata, writer, reader) and keeping them updated when > crypto package change. > > We have had a great time to cooperate. Thank Gidon and Revital for all > guide and experience I have received from them, too. > > > > Here are the links of pull requests: > > 1) encryption module (properties, metadata, writer, reader): > https://github.com/apache/arrow/pull/2555. > > 2) some merged pull requests for new thrift structure and crypto algorithm, > and one still open: https://github.com/apache/arrow/pull/3520 > > > > However, in order to make (1) buildable with current build scripts, we need > “adding openssl in C++ build toolchain” which is mentioned in this jira: > https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful if > someone could help fullfill this work. > > > > About current pull requests, they has been currently using in our > development phase at Emotiv (https://www.emotiv.com/). We love using > parquet files to store EEG data. We are going to release a product with > encrypted parquet files soon and look forward to the official release of > parquet encryption feature. So it will be many thank and great honor to > have you review and merge them (if qualified). > > > > Thank you very much! > > Tham >
