Hi Deepak, If it simplifies the code review, we can make this change. Actually, we're already creating a single branch that has the parquet-cpp with encryption in one place, ready for build and sample run. This is a private branch - but we can do the same for the pull request (2555), making it the staging branch. However, there are a number of technical details we need to address first:
1) The private branch we're creating contains not only the cpp encryption sample, but also an additional interop sample that checks interoperability with parquet-mr encryption. Do we want to add this interop sample to the staging branch? Should this sample be merged in arrow/parquet-cpp for future interop testing with Java encryption? Or this sample can stay in the staging branch for the duration of the review process, and be removed just before the staging PR is merged in arrow/parquet-cpp? Or we dont add this interop sample to the staging branch at all, and keep our private branch for interop testing? Let us know your preference. 2) PR 3520 has been reviewed. We'll make the requested changes this week, and update the PR. It adds 2 new files, and doesnt touch the existing ones - if you guys will be ok with the updated version, you can merge it, and then the staging PR (2555) will automatically pick the 3520 patch upon rebasing to the then-current master state. But, if further changes will be required in the #3520 files, we'll add them to the staging PR. Cheers, Gidon. On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <[email protected]> wrote: > Hi Gidon and Tham, > > I opened a PR for OpenSSL build support here: > https://github.com/apache/arrow/pull/4384/files > However, your PRs need to be rebased/merged with master. > Why don't we keep one staging PR for C++? Probably make > https://github.com/apache/arrow/pull/2555 the staging PR and cherry-pick > https://github.com/apache/arrow/pull/3520/files and the above PR onto it. > The later PRs are tiny anyway. This simplifies code review, syncing with > the master, as well as building the encryption work while reviewing. > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <[email protected]> > wrote: > > > Thanks guys. We'll create a doc with a description of the current interop > > tests, including details on how to build/run them. > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <[email protected]> > > wrote: > > > > > I will make the first attempt to review the C++ code. I reviewed part > of > > it > > > a while ago. > > > Java-C++ inter-operation testing is my main concern. Can you share some > > > details on what has been tested? > > > I need to update myself with the key management part and see if we can > > add > > > some test parquet files written in Java along with the keys to the data > > > folder. > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <[email protected]> > > wrote: > > > > > > > hi folks, > > > > > > > > I'm extraordinarily busy but it seems likely that the code review for > > > > Parquet C++ is largely going to fall on my shoulders. Since this is a > > > > large patch with many touch-points in the C++ codebase, and there > > > > hasn't been much feedback over the course of its development, I would > > > > like to leave thoughtful code review so that we can avoid issues that > > > > may lead to burdensome maintenance issues. > > > > > > > > On the OpenSSL issue -- there are many CMake projects in the wild > that > > > > use OpenSSL and there are many examples of how to incorporate other > > > > dependencies into the Arrow build system. I don't have time to work > on > > > > this right now so someone else will have to sort out the toolchain > > > > issue > > > > > > > > Thanks > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <[email protected]> > > > wrote: > > > > > > > > > > <re-sending from another account> > > > > > > > > > > Tham, thank you for this! and for volunteering early for the C++ > > > version > > > > > work, driving it forward and creating the bulk of the parquet-cpp > > > > > encryption code along the way. > > > > > > > > > > @All - this announcement means that two implementations of Parquet > > > > > encryption, fully conforming to the formal specification, are > > available > > > > > today. Thanks to Revital for contributing to C++ version compliance > > > with > > > > > the encryption spec, and for running a set of basic Java-C++ > > encryption > > > > > interoperability tests. We have tested plaintext and encrypted > footer > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files > > > written > > > > > with parquet-cpp are successfully parsed by parquet-mr, and vice > > versa. > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their support > > and > > > > vote > > > > > for the encryption specification - along with the PMC folks. > > > > > > > > > > All parquet-format pull requests are merged by now into the > > encryption > > > > > branch, > > > > > https://github.com/apache/parquet-format/tree/encryption > > > > > > > > > > The community is welcome to review the parquet-mr pull requests, in > > the > > > > > following order: > > > > > https://github.com/apache/parquet-mr/pull/613 > > > > > https://github.com/apache/parquet-mr/pull/614 > > > > > https://github.com/apache/parquet-mr/pull/643 > > > > > > > > > > > > > > > Currently, an end-to-end implementation of Java (mr) Parquet > > encryption > > > > is > > > > > collected in this branch: > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr > > > > > Thanks to Xinli for working with this branch code, and contributing > > to > > > it > > > > > based on his field experience. Everybody is welcome to do the same. > > > > > @All - it would be helpful to review & merge the above PRs in > > > > > apache/parquet-mr/encryption, so that folks can work with it > instead > > of > > > > my > > > > > private branch.. > > > > > > > > > > And I certainly second Tham's call to review & merge the > parquet-cpp > > > pull > > > > > requests. By now, we have a number of companies starting to utilize > > > > Parquet > > > > > encryption (both C++ and Java), including IBM. > > > > > > > > > > > > > > > Cheers, Gidon. > > > > > > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <[email protected]> wrote: > > > > > > > > > > > Hi community, > > > > > > > > > > > > > > > > > > > > > > > > After a long time of development, I'm honor to announce that we > > have > > > > just > > > > > > completed C++ parquet encryption module which implements > encryption > > > in > > > > low > > > > > > level api and with examples included. > > > > > > > > > > > > > > > > > > > > > > > > To have this feature completed, I would like to thank Gidon and > > > > Revital for > > > > > > their contribution. > > > > > > > > > > > > Gidon had a key role in encryption design and in writing Java > > version > > > > code > > > > > > on which we based on to write C++ version. He also wrote crypto > > > > package in > > > > > > C++ version. > > > > > > > > > > > > Revital and me has been joining together in writing C++ version. > > > > Revital > > > > > > was responsible for AAD calculations, API updating (to be the > same > > > with > > > > > > Java version) and Java-C++ inter-operation testing. I was writing > > the > > > > first > > > > > > draft (properties, metadata, writer, reader) and keeping them > > updated > > > > when > > > > > > crypto package change. > > > > > > > > > > > > We have had a great time to cooperate. Thank Gidon and Revital > for > > > all > > > > > > guide and experience I have received from them, too. > > > > > > > > > > > > > > > > > > > > > > > > Here are the links of pull requests: > > > > > > > > > > > > 1) encryption module (properties, metadata, writer, reader): > > > > > > https://github.com/apache/arrow/pull/2555. > > > > > > > > > > > > 2) some merged pull requests for new thrift structure and crypto > > > > algorithm, > > > > > > and one still open: https://github.com/apache/arrow/pull/3520 > > > > > > > > > > > > > > > > > > > > > > > > However, in order to make (1) buildable with current build > scripts, > > > we > > > > need > > > > > > “adding openssl in C++ build toolchain” which is mentioned in > this > > > > jira: > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be > > grateful > > > > if > > > > > > someone could help fullfill this work. > > > > > > > > > > > > > > > > > > > > > > > > About current pull requests, they has been currently using in our > > > > > > development phase at Emotiv (https://www.emotiv.com/). We love > > using > > > > > > parquet files to store EEG data. We are going to release a > product > > > with > > > > > > encrypted parquet files soon and look forward to the official > > release > > > > of > > > > > > parquet encryption feature. So it will be many thank and great > > honor > > > to > > > > > > have you review and merge them (if qualified). > > > > > > > > > > > > > > > > > > > > > > > > Thank you very much! > > > > > > > > > > > > Tham > > > > > > > > > > > > > > > > > > > -- > > > regards, > > > Deepak Majeti > > > > > > > > -- > regards, > Deepak Majeti >
