Thinking of this, it occurred to me that the words "interop" and "test"
might be actually misleading in this context.
What we have, is a C++ sample, similar to the current reader-writer.cc or
the new encryption-reader-writer.cc (both located under the
cpp/examples/parquet/low-level-api).
The encryption-reader-writer.cc  writes one file using one encryption
config, and reads it with one decryption config.
Our sample is the same, but it writes N files with N different encryption
configs, and reads all these files using M different decryption configs.
All encryption/decryption configs are meaningful, we're adding detailed
comments to each, so this sample becomes an "encryption HelloWorld" on
steroids.
A good place to see the code usage of API for different
encryption/decryption modes.

Now, since we have an identical sample in Java, we can test the
interoperability - when files are produced by Java sample and read by C++
sample, or vice versa.

Given the first part, maybe we should place it together with other samples,
under cpp/examples/parquet/low-level-api in the apache/arrow repo.
This has an added benefit of getting built together with arrow/parquet,
without dependency on other repos.

Here goes the current version:
https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-reader-writer-all-crypto-options.cc


Cheers, Gidon.

On Thu, May 30, 2019 at 12:45 AM Wes McKinney <[email protected]> wrote:

> If there are encryption integration testing utilities there's not much
> reason not to put them in
>
> https://github.com/apache/arrow/tree/master/cpp/tools/parquet
>
> (or similar)
>
> Then there is no additional build system to maintain
>
> On Wed, May 29, 2019 at 4:41 PM Deepak Majeti <[email protected]>
> wrote:
> >
> > Hi Gidon,
> >
> > We should definitely add encrypted parquet files written by parquet-mr in
> > the parquet-testing repo data folder.
> > I also like the idea of keeping the source files in a separate "src"
> folder
> > in that repository. I see no downside in doing so.
> >
> > On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <[email protected]>
> wrote:
> >
> > > Hi Wes,
> > >
> > > Yep, I meant private as in not under apache/arrow. Will be more careful
> > > with that term.
> > >
> > > https://github.com/apache/parquet-testing  seems to currently contain
> only
> > > data (.parquet) files.
> > > Our samples are C++ and Java code files, that write and read .parquet
> files
> > > in different encryption configurations.
> > > We can add these encrypted .parquet files to the apache/parquet-testing
> > > repo.
> > > Let us know if we also should add the samples themselves there (by
> creating
> > > src folders in this repo), or add them to the apache/arrow/mr repos,
> or to
> > > keep them in our personal repos (the ones I've called private before).
> > > These samples are based on the standard parquet samples (such as
> > >
> > >
> https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc
> > > ),
> > > with a loop that creates different encryption configs, and
> writes/reads the
> > > same data with them.
> > >
> > > Cheers, Gidon.
> > >
> > >
> > > On Tue, May 28, 2019 at 5:51 PM Wes McKinney <[email protected]>
> wrote:
> > >
> > > > hi Gidon,
> > > >
> > > > Surely you don't mean "private" private -- this was always on GitHub
> > > > right (otherwise this runs against the Apache Way)?
> > > >
> > > > I think we should try to have all of the testing code in the upstream
> > > > repositories. If there are sample files you need to check in (and
> they
> > > > are not too large), please use the apache/parquet-testing repository
> > > >
> > > > - Wes
> > > >
> > > > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <[email protected]>
> > > wrote:
> > > > >
> > > > > Hi Deepak,
> > > > >
> > > > > Sounds good. Here goes the current cpp interop sample (not final
> yet).
> > > We
> > > > > have a sample with a similar code in Java.
> > > > >
> > > >
> > >
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> > > > >
> > > > >
> > > > > Please let me know if we should add this file to the cpp PR and its
> > > > staging
> > > > > branch (under parquet/low-level-api, or another folder?)
> > > > > - or leave this file in our private branch.
> > > > >
> > > > > Cheers, Gidon.
> > > > >
> > > > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <
> [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Gidon,
> > > > > >
> > > > > > I agree with your point 2). I merged the PR for OpenSSL build
> > > support.
> > > > We
> > > > > > can do the same for PR-3520 after you push the changes this
> week. We
> > > > can
> > > > > > then simply rebase PR-2555 and make the final review.
> > > > > > For 1) I would like to take a look at the inter-op testing
> framework
> > > > first.
> > > > > > Can you share the inter-op sample? Thanks.
> > > > > >
> > > > > >
> > > > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <
> [email protected]>
> > > > wrote:
> > > > > >
> > > > > > > Hi Deepak,
> > > > > > >
> > > > > > > If it simplifies the code review, we can make this change.
> > > Actually,
> > > > > > we're
> > > > > > > already creating a single branch that has the parquet-cpp with
> > > > encryption
> > > > > > > in one place, ready for build and sample run. This is a private
> > > > branch -
> > > > > > > but we can do the same for the pull request (2555), making it
> the
> > > > staging
> > > > > > > branch. However, there are a number of technical details we
> need to
> > > > > > address
> > > > > > > first:
> > > > > > >
> > > > > > > 1) The private branch we're creating contains not only the cpp
> > > > encryption
> > > > > > > sample, but also an additional interop sample that checks
> > > > > > interoperability
> > > > > > > with parquet-mr encryption.
> > > > > > > Do we want to add this interop sample to the staging branch?
> Should
> > > > this
> > > > > > > sample be merged in arrow/parquet-cpp for future interop
> testing
> > > with
> > > > > > Java
> > > > > > > encryption? Or this sample can stay in the staging branch for
> the
> > > > > > duration
> > > > > > > of the review process, and be removed just before the staging
> PR is
> > > > > > merged
> > > > > > > in arrow/parquet-cpp? Or we dont add this interop sample to the
> > > > staging
> > > > > > > branch at all, and keep our private branch for interop
> testing? Let
> > > > us
> > > > > > know
> > > > > > > your preference.
> > > > > > >
> > > > > > > 2) PR 3520 has been reviewed. We'll make the requested changes
> this
> > > > week,
> > > > > > > and update the PR. It adds 2 new files, and doesnt touch the
> > > existing
> > > > > > ones
> > > > > > > - if you guys will be ok with the updated version, you can
> merge
> > > it,
> > > > and
> > > > > > > then the staging PR (2555) will automatically pick the 3520
> patch
> > > > upon
> > > > > > > rebasing to the then-current master state. But, if further
> changes
> > > > will
> > > > > > be
> > > > > > > required in the #3520 files, we'll add them to the staging PR.
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Gidon and Tham,
> > > > > > > >
> > > > > > > > I opened a PR for OpenSSL build support here:
> > > > > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > > > > However, your PRs need to be rebased/merged with master.
> > > > > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > > > > > cherry-pick
> > > > > > > > https://github.com/apache/arrow/pull/3520/files and the
> above PR
> > > > onto
> > > > > > > it.
> > > > > > > > The later PRs are tiny anyway. This simplifies code review,
> > > syncing
> > > > > > with
> > > > > > > > the master, as well as building the encryption work while
> > > > reviewing.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks guys. We'll create a doc with a description of the
> > > current
> > > > > > > interop
> > > > > > > > > tests, including details on how to build/run them.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I will make the first attempt to review the C++ code. I
> > > > reviewed
> > > > > > part
> > > > > > > > of
> > > > > > > > > it
> > > > > > > > > > a while ago.
> > > > > > > > > > Java-C++ inter-operation testing is my main concern. Can
> you
> > > > share
> > > > > > > some
> > > > > > > > > > details on what has been tested?
> > > > > > > > > > I need to update myself with the key management part and
> see
> > > > if we
> > > > > > > can
> > > > > > > > > add
> > > > > > > > > > some test parquet files written in Java along with the
> keys
> > > to
> > > > the
> > > > > > > data
> > > > > > > > > > folder.
> > > > > > > > > >
> > > > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > hi folks,
> > > > > > > > > > >
> > > > > > > > > > > I'm extraordinarily busy but it seems likely that the
> code
> > > > review
> > > > > > > for
> > > > > > > > > > > Parquet C++ is largely going to fall on my shoulders.
> Since
> > > > this
> > > > > > > is a
> > > > > > > > > > > large patch with many touch-points in the C++
> codebase, and
> > > > there
> > > > > > > > > > > hasn't been much feedback over the course of its
> > > > development, I
> > > > > > > would
> > > > > > > > > > > like to leave thoughtful code review so that we can
> avoid
> > > > issues
> > > > > > > that
> > > > > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > > > > >
> > > > > > > > > > > On the OpenSSL issue -- there are many CMake projects
> in
> > > the
> > > > wild
> > > > > > > > that
> > > > > > > > > > > use OpenSSL and there are many examples of how to
> > > incorporate
> > > > > > other
> > > > > > > > > > > dependencies into the Arrow build system. I don't have
> time
> > > > to
> > > > > > work
> > > > > > > > on
> > > > > > > > > > > this right now so someone else will have to sort out
> the
> > > > > > toolchain
> > > > > > > > > > > issue
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > <re-sending from another account>
> > > > > > > > > > > >
> > > > > > > > > > > > Tham, thank you for this! and for volunteering early
> for
> > > > the
> > > > > > C++
> > > > > > > > > > version
> > > > > > > > > > > > work, driving it forward and creating the bulk of the
> > > > > > parquet-cpp
> > > > > > > > > > > > encryption code along the way.
> > > > > > > > > > > >
> > > > > > > > > > > > @All - this announcement means that two
> implementations
> > > of
> > > > > > > Parquet
> > > > > > > > > > > > encryption, fully conforming to the formal
> specification,
> > > > are
> > > > > > > > > available
> > > > > > > > > > > > today. Thanks to Revital for contributing to C++
> version
> > > > > > > compliance
> > > > > > > > > > with
> > > > > > > > > > > > the encryption spec, and for running a set of basic
> > > > Java-C++
> > > > > > > > > encryption
> > > > > > > > > > > > interoperability tests. We have tested plaintext and
> > > > encrypted
> > > > > > > > footer
> > > > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy
> > > readers.
> > > > > > Files
> > > > > > > > > > written
> > > > > > > > > > > > with parquet-cpp are successfully parsed by
> parquet-mr,
> > > and
> > > > > > vice
> > > > > > > > > versa.
> > > > > > > > > > > >
> > > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for
> > > their
> > > > > > > support
> > > > > > > > > and
> > > > > > > > > > > vote
> > > > > > > > > > > > for the encryption specification - along with the PMC
> > > > folks.
> > > > > > > > > > > >
> > > > > > > > > > > > All parquet-format pull requests are merged by now
> into
> > > the
> > > > > > > > > encryption
> > > > > > > > > > > > branch,
> > > > > > > > > > > >
> https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > > > > >
> > > > > > > > > > > > The community is welcome to review the parquet-mr
> pull
> > > > > > requests,
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > following order:
> > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Currently, an end-to-end implementation of Java (mr)
> > > > Parquet
> > > > > > > > > encryption
> > > > > > > > > > > is
> > > > > > > > > > > > collected in this branch:
> > > > > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > > > > Thanks to Xinli for working with this branch code,
> and
> > > > > > > contributing
> > > > > > > > > to
> > > > > > > > > > it
> > > > > > > > > > > > based on his field experience. Everybody is welcome
> to do
> > > > the
> > > > > > > same.
> > > > > > > > > > > > @All - it would be helpful to review & merge the
> above
> > > PRs
> > > > in
> > > > > > > > > > > > apache/parquet-mr/encryption, so that folks can work
> with
> > > > it
> > > > > > > > instead
> > > > > > > > > of
> > > > > > > > > > > my
> > > > > > > > > > > > private branch..
> > > > > > > > > > > >
> > > > > > > > > > > > And I certainly second Tham's call to review & merge
> the
> > > > > > > > parquet-cpp
> > > > > > > > > > pull
> > > > > > > > > > > > requests. By now, we have a number of companies
> starting
> > > to
> > > > > > > utilize
> > > > > > > > > > > Parquet
> > > > > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers, Gidon.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <
> [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi community,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > After a long time of development, I'm honor to
> announce
> > > > that
> > > > > > we
> > > > > > > > > have
> > > > > > > > > > > just
> > > > > > > > > > > > > completed C++ parquet encryption module which
> > > implements
> > > > > > > > encryption
> > > > > > > > > > in
> > > > > > > > > > > low
> > > > > > > > > > > > > level api and with examples included.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > To have this feature completed, I would like to
> thank
> > > > Gidon
> > > > > > and
> > > > > > > > > > > Revital for
> > > > > > > > > > > > > their contribution.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Gidon had a key role in encryption design and in
> > > writing
> > > > Java
> > > > > > > > > version
> > > > > > > > > > > code
> > > > > > > > > > > > > on which we based on to write C++ version. He also
> > > wrote
> > > > > > crypto
> > > > > > > > > > > package in
> > > > > > > > > > > > > C++ version.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Revital and me has been joining together in
> writing C++
> > > > > > > version.
> > > > > > > > > > > Revital
> > > > > > > > > > > > > was responsible for AAD calculations, API updating
> (to
> > > > be the
> > > > > > > > same
> > > > > > > > > > with
> > > > > > > > > > > > > Java version) and Java-C++ inter-operation
> testing. I
> > > was
> > > > > > > writing
> > > > > > > > > the
> > > > > > > > > > > first
> > > > > > > > > > > > > draft (properties, metadata, writer, reader) and
> > > keeping
> > > > them
> > > > > > > > > updated
> > > > > > > > > > > when
> > > > > > > > > > > > > crypto package change.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We have had a great time to cooperate. Thank Gidon
> and
> > > > > > Revital
> > > > > > > > for
> > > > > > > > > > all
> > > > > > > > > > > > > guide and experience I have received from them,
> too.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) encryption module (properties, metadata, writer,
> > > > reader):
> > > > > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2) some merged pull requests for new thrift
> structure
> > > and
> > > > > > > crypto
> > > > > > > > > > > algorithm,
> > > > > > > > > > > > > and one still open:
> > > > > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, in order to make (1) buildable with
> current
> > > > build
> > > > > > > > scripts,
> > > > > > > > > > we
> > > > > > > > > > > need
> > > > > > > > > > > > > “adding openssl in C++ build toolchain” which is
> > > > mentioned in
> > > > > > > > this
> > > > > > > > > > > jira:
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302.
> I
> > > > will be
> > > > > > > > > grateful
> > > > > > > > > > > if
> > > > > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > About current pull requests, they has been
> currently
> > > > using in
> > > > > > > our
> > > > > > > > > > > > > development phase at Emotiv (
> https://www.emotiv.com/).
> > > > We
> > > > > > love
> > > > > > > > > using
> > > > > > > > > > > > > parquet files to store EEG data. We are going to
> > > release
> > > > a
> > > > > > > > product
> > > > > > > > > > with
> > > > > > > > > > > > > encrypted parquet files soon and look forward to
> the
> > > > official
> > > > > > > > > release
> > > > > > > > > > > of
> > > > > > > > > > > > > parquet encryption feature. So it will be many
> thank
> > > and
> > > > > > great
> > > > > > > > > honor
> > > > > > > > > > to
> > > > > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you very much!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Tham
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > regards,
> > > > > > > > > > Deepak Majeti
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > regards,
> > > > > > > > Deepak Majeti
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > regards,
> > > > > > Deepak Majeti
> > > > > >
> > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
>

Reply via email to