Thinking of this, it occurred to me that the words "interop" and "test" might be actually misleading in this context. What we have, is a C++ sample, similar to the current reader-writer.cc or the new encryption-reader-writer.cc (both located under the cpp/examples/parquet/low-level-api). The encryption-reader-writer.cc writes one file using one encryption config, and reads it with one decryption config. Our sample is the same, but it writes N files with N different encryption configs, and reads all these files using M different decryption configs. All encryption/decryption configs are meaningful, we're adding detailed comments to each, so this sample becomes an "encryption HelloWorld" on steroids. A good place to see the code usage of API for different encryption/decryption modes.
Now, since we have an identical sample in Java, we can test the interoperability - when files are produced by Java sample and read by C++ sample, or vice versa. Given the first part, maybe we should place it together with other samples, under cpp/examples/parquet/low-level-api in the apache/arrow repo. This has an added benefit of getting built together with arrow/parquet, without dependency on other repos. Here goes the current version: https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-reader-writer-all-crypto-options.cc Cheers, Gidon. On Thu, May 30, 2019 at 12:45 AM Wes McKinney <[email protected]> wrote: > If there are encryption integration testing utilities there's not much > reason not to put them in > > https://github.com/apache/arrow/tree/master/cpp/tools/parquet > > (or similar) > > Then there is no additional build system to maintain > > On Wed, May 29, 2019 at 4:41 PM Deepak Majeti <[email protected]> > wrote: > > > > Hi Gidon, > > > > We should definitely add encrypted parquet files written by parquet-mr in > > the parquet-testing repo data folder. > > I also like the idea of keeping the source files in a separate "src" > folder > > in that repository. I see no downside in doing so. > > > > On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <[email protected]> > wrote: > > > > > Hi Wes, > > > > > > Yep, I meant private as in not under apache/arrow. Will be more careful > > > with that term. > > > > > > https://github.com/apache/parquet-testing seems to currently contain > only > > > data (.parquet) files. > > > Our samples are C++ and Java code files, that write and read .parquet > files > > > in different encryption configurations. > > > We can add these encrypted .parquet files to the apache/parquet-testing > > > repo. > > > Let us know if we also should add the samples themselves there (by > creating > > > src folders in this repo), or add them to the apache/arrow/mr repos, > or to > > > keep them in our personal repos (the ones I've called private before). > > > These samples are based on the standard parquet samples (such as > > > > > > > https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc > > > ), > > > with a loop that creates different encryption configs, and > writes/reads the > > > same data with them. > > > > > > Cheers, Gidon. > > > > > > > > > On Tue, May 28, 2019 at 5:51 PM Wes McKinney <[email protected]> > wrote: > > > > > > > hi Gidon, > > > > > > > > Surely you don't mean "private" private -- this was always on GitHub > > > > right (otherwise this runs against the Apache Way)? > > > > > > > > I think we should try to have all of the testing code in the upstream > > > > repositories. If there are sample files you need to check in (and > they > > > > are not too large), please use the apache/parquet-testing repository > > > > > > > > - Wes > > > > > > > > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <[email protected]> > > > wrote: > > > > > > > > > > Hi Deepak, > > > > > > > > > > Sounds good. Here goes the current cpp interop sample (not final > yet). > > > We > > > > > have a sample with a similar code in Java. > > > > > > > > > > > > > https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc > > > > > > > > > > > > > > > Please let me know if we should add this file to the cpp PR and its > > > > staging > > > > > branch (under parquet/low-level-api, or another folder?) > > > > > - or leave this file in our private branch. > > > > > > > > > > Cheers, Gidon. > > > > > > > > > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti < > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > Hi Gidon, > > > > > > > > > > > > I agree with your point 2). I merged the PR for OpenSSL build > > > support. > > > > We > > > > > > can do the same for PR-3520 after you push the changes this > week. We > > > > can > > > > > > then simply rebase PR-2555 and make the final review. > > > > > > For 1) I would like to take a look at the inter-op testing > framework > > > > first. > > > > > > Can you share the inter-op sample? Thanks. > > > > > > > > > > > > > > > > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky < > [email protected]> > > > > wrote: > > > > > > > > > > > > > Hi Deepak, > > > > > > > > > > > > > > If it simplifies the code review, we can make this change. > > > Actually, > > > > > > we're > > > > > > > already creating a single branch that has the parquet-cpp with > > > > encryption > > > > > > > in one place, ready for build and sample run. This is a private > > > > branch - > > > > > > > but we can do the same for the pull request (2555), making it > the > > > > staging > > > > > > > branch. However, there are a number of technical details we > need to > > > > > > address > > > > > > > first: > > > > > > > > > > > > > > 1) The private branch we're creating contains not only the cpp > > > > encryption > > > > > > > sample, but also an additional interop sample that checks > > > > > > interoperability > > > > > > > with parquet-mr encryption. > > > > > > > Do we want to add this interop sample to the staging branch? > Should > > > > this > > > > > > > sample be merged in arrow/parquet-cpp for future interop > testing > > > with > > > > > > Java > > > > > > > encryption? Or this sample can stay in the staging branch for > the > > > > > > duration > > > > > > > of the review process, and be removed just before the staging > PR is > > > > > > merged > > > > > > > in arrow/parquet-cpp? Or we dont add this interop sample to the > > > > staging > > > > > > > branch at all, and keep our private branch for interop > testing? Let > > > > us > > > > > > know > > > > > > > your preference. > > > > > > > > > > > > > > 2) PR 3520 has been reviewed. We'll make the requested changes > this > > > > week, > > > > > > > and update the PR. It adds 2 new files, and doesnt touch the > > > existing > > > > > > ones > > > > > > > - if you guys will be ok with the updated version, you can > merge > > > it, > > > > and > > > > > > > then the staging PR (2555) will automatically pick the 3520 > patch > > > > upon > > > > > > > rebasing to the then-current master state. But, if further > changes > > > > will > > > > > > be > > > > > > > required in the #3520 files, we'll add them to the staging PR. > > > > > > > > > > > > > > Cheers, Gidon. > > > > > > > > > > > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti < > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Gidon and Tham, > > > > > > > > > > > > > > > > I opened a PR for OpenSSL build support here: > > > > > > > > https://github.com/apache/arrow/pull/4384/files > > > > > > > > However, your PRs need to be rebased/merged with master. > > > > > > > > Why don't we keep one staging PR for C++? Probably make > > > > > > > > https://github.com/apache/arrow/pull/2555 the staging PR and > > > > > > cherry-pick > > > > > > > > https://github.com/apache/arrow/pull/3520/files and the > above PR > > > > onto > > > > > > > it. > > > > > > > > The later PRs are tiny anyway. This simplifies code review, > > > syncing > > > > > > with > > > > > > > > the master, as well as building the encryption work while > > > > reviewing. > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks guys. We'll create a doc with a description of the > > > current > > > > > > > interop > > > > > > > > > tests, including details on how to build/run them. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti < > > > > > > [email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > I will make the first attempt to review the C++ code. I > > > > reviewed > > > > > > part > > > > > > > > of > > > > > > > > > it > > > > > > > > > > a while ago. > > > > > > > > > > Java-C++ inter-operation testing is my main concern. Can > you > > > > share > > > > > > > some > > > > > > > > > > details on what has been tested? > > > > > > > > > > I need to update myself with the key management part and > see > > > > if we > > > > > > > can > > > > > > > > > add > > > > > > > > > > some test parquet files written in Java along with the > keys > > > to > > > > the > > > > > > > data > > > > > > > > > > folder. > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney < > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > hi folks, > > > > > > > > > > > > > > > > > > > > > > I'm extraordinarily busy but it seems likely that the > code > > > > review > > > > > > > for > > > > > > > > > > > Parquet C++ is largely going to fall on my shoulders. > Since > > > > this > > > > > > > is a > > > > > > > > > > > large patch with many touch-points in the C++ > codebase, and > > > > there > > > > > > > > > > > hasn't been much feedback over the course of its > > > > development, I > > > > > > > would > > > > > > > > > > > like to leave thoughtful code review so that we can > avoid > > > > issues > > > > > > > that > > > > > > > > > > > may lead to burdensome maintenance issues. > > > > > > > > > > > > > > > > > > > > > > On the OpenSSL issue -- there are many CMake projects > in > > > the > > > > wild > > > > > > > > that > > > > > > > > > > > use OpenSSL and there are many examples of how to > > > incorporate > > > > > > other > > > > > > > > > > > dependencies into the Arrow build system. I don't have > time > > > > to > > > > > > work > > > > > > > > on > > > > > > > > > > > this right now so someone else will have to sort out > the > > > > > > toolchain > > > > > > > > > > > issue > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky < > > > > > > [email protected] > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > <re-sending from another account> > > > > > > > > > > > > > > > > > > > > > > > > Tham, thank you for this! and for volunteering early > for > > > > the > > > > > > C++ > > > > > > > > > > version > > > > > > > > > > > > work, driving it forward and creating the bulk of the > > > > > > parquet-cpp > > > > > > > > > > > > encryption code along the way. > > > > > > > > > > > > > > > > > > > > > > > > @All - this announcement means that two > implementations > > > of > > > > > > > Parquet > > > > > > > > > > > > encryption, fully conforming to the formal > specification, > > > > are > > > > > > > > > available > > > > > > > > > > > > today. Thanks to Revital for contributing to C++ > version > > > > > > > compliance > > > > > > > > > > with > > > > > > > > > > > > the encryption spec, and for running a set of basic > > > > Java-C++ > > > > > > > > > encryption > > > > > > > > > > > > interoperability tests. We have tested plaintext and > > > > encrypted > > > > > > > > footer > > > > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy > > > readers. > > > > > > Files > > > > > > > > > > written > > > > > > > > > > > > with parquet-cpp are successfully parsed by > parquet-mr, > > > and > > > > > > vice > > > > > > > > > versa. > > > > > > > > > > > > > > > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for > > > their > > > > > > > support > > > > > > > > > and > > > > > > > > > > > vote > > > > > > > > > > > > for the encryption specification - along with the PMC > > > > folks. > > > > > > > > > > > > > > > > > > > > > > > > All parquet-format pull requests are merged by now > into > > > the > > > > > > > > > encryption > > > > > > > > > > > > branch, > > > > > > > > > > > > > https://github.com/apache/parquet-format/tree/encryption > > > > > > > > > > > > > > > > > > > > > > > > The community is welcome to review the parquet-mr > pull > > > > > > requests, > > > > > > > in > > > > > > > > > the > > > > > > > > > > > > following order: > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/613 > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/614 > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/643 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Currently, an end-to-end implementation of Java (mr) > > > > Parquet > > > > > > > > > encryption > > > > > > > > > > > is > > > > > > > > > > > > collected in this branch: > > > > > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr > > > > > > > > > > > > Thanks to Xinli for working with this branch code, > and > > > > > > > contributing > > > > > > > > > to > > > > > > > > > > it > > > > > > > > > > > > based on his field experience. Everybody is welcome > to do > > > > the > > > > > > > same. > > > > > > > > > > > > @All - it would be helpful to review & merge the > above > > > PRs > > > > in > > > > > > > > > > > > apache/parquet-mr/encryption, so that folks can work > with > > > > it > > > > > > > > instead > > > > > > > > > of > > > > > > > > > > > my > > > > > > > > > > > > private branch.. > > > > > > > > > > > > > > > > > > > > > > > > And I certainly second Tham's call to review & merge > the > > > > > > > > parquet-cpp > > > > > > > > > > pull > > > > > > > > > > > > requests. By now, we have a number of companies > starting > > > to > > > > > > > utilize > > > > > > > > > > > Parquet > > > > > > > > > > > > encryption (both C++ and Java), including IBM. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, Gidon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha < > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi community, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After a long time of development, I'm honor to > announce > > > > that > > > > > > we > > > > > > > > > have > > > > > > > > > > > just > > > > > > > > > > > > > completed C++ parquet encryption module which > > > implements > > > > > > > > encryption > > > > > > > > > > in > > > > > > > > > > > low > > > > > > > > > > > > > level api and with examples included. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To have this feature completed, I would like to > thank > > > > Gidon > > > > > > and > > > > > > > > > > > Revital for > > > > > > > > > > > > > their contribution. > > > > > > > > > > > > > > > > > > > > > > > > > > Gidon had a key role in encryption design and in > > > writing > > > > Java > > > > > > > > > version > > > > > > > > > > > code > > > > > > > > > > > > > on which we based on to write C++ version. He also > > > wrote > > > > > > crypto > > > > > > > > > > > package in > > > > > > > > > > > > > C++ version. > > > > > > > > > > > > > > > > > > > > > > > > > > Revital and me has been joining together in > writing C++ > > > > > > > version. > > > > > > > > > > > Revital > > > > > > > > > > > > > was responsible for AAD calculations, API updating > (to > > > > be the > > > > > > > > same > > > > > > > > > > with > > > > > > > > > > > > > Java version) and Java-C++ inter-operation > testing. I > > > was > > > > > > > writing > > > > > > > > > the > > > > > > > > > > > first > > > > > > > > > > > > > draft (properties, metadata, writer, reader) and > > > keeping > > > > them > > > > > > > > > updated > > > > > > > > > > > when > > > > > > > > > > > > > crypto package change. > > > > > > > > > > > > > > > > > > > > > > > > > > We have had a great time to cooperate. Thank Gidon > and > > > > > > Revital > > > > > > > > for > > > > > > > > > > all > > > > > > > > > > > > > guide and experience I have received from them, > too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here are the links of pull requests: > > > > > > > > > > > > > > > > > > > > > > > > > > 1) encryption module (properties, metadata, writer, > > > > reader): > > > > > > > > > > > > > https://github.com/apache/arrow/pull/2555. > > > > > > > > > > > > > > > > > > > > > > > > > > 2) some merged pull requests for new thrift > structure > > > and > > > > > > > crypto > > > > > > > > > > > algorithm, > > > > > > > > > > > > > and one still open: > > > > > > https://github.com/apache/arrow/pull/3520 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, in order to make (1) buildable with > current > > > > build > > > > > > > > scripts, > > > > > > > > > > we > > > > > > > > > > > need > > > > > > > > > > > > > “adding openssl in C++ build toolchain” which is > > > > mentioned in > > > > > > > > this > > > > > > > > > > > jira: > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. > I > > > > will be > > > > > > > > > grateful > > > > > > > > > > > if > > > > > > > > > > > > > someone could help fullfill this work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > About current pull requests, they has been > currently > > > > using in > > > > > > > our > > > > > > > > > > > > > development phase at Emotiv ( > https://www.emotiv.com/). > > > > We > > > > > > love > > > > > > > > > using > > > > > > > > > > > > > parquet files to store EEG data. We are going to > > > release > > > > a > > > > > > > > product > > > > > > > > > > with > > > > > > > > > > > > > encrypted parquet files soon and look forward to > the > > > > official > > > > > > > > > release > > > > > > > > > > > of > > > > > > > > > > > > > parquet encryption feature. So it will be many > thank > > > and > > > > > > great > > > > > > > > > honor > > > > > > > > > > to > > > > > > > > > > > > > have you review and merge them (if qualified). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you very much! > > > > > > > > > > > > > > > > > > > > > > > > > > Tham > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > regards, > > > > > > > > > > Deepak Majeti > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > regards, > > > > > > > > Deepak Majeti > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > regards, > > > > > > Deepak Majeti > > > > > > > > > > > > > > > > > > > -- > > regards, > > Deepak Majeti >
