Hi Gidon,

Sure. I’m working on it.

Regards, Tham

From: Gidon Gershinsky
Sent: Thursday, May 30, 2019 4:39 AM
To: [email protected]
Subject: Re: Announcement for c++ parquet encryption

Wes, thanks for reviewing and merging the 3520. Tham, can you please rebase
2555 to take these and Deepak's changes in, so your PR becomes the staging
branch.
Revital will then send an additional input to your branch. Afterwards, with
your changes and additions, please let us know when PR 2555 is ready for a
review.

Cheers, Gidon.

On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <[email protected]>
wrote:

> Hi Gidon,
>
> I agree with your point 2). I merged the PR for OpenSSL build support. We
> can do the same for PR-3520 after you push the changes this week. We can
> then simply rebase PR-2555 and make the final review.
> For 1) I would like to take a look at the inter-op testing framework first.
> Can you share the inter-op sample? Thanks.
>
>
> On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <[email protected]> wrote:
>
> > Hi Deepak,
> >
> > If it simplifies the code review, we can make this change. Actually,
> we're
> > already creating a single branch that has the parquet-cpp with encryption
> > in one place, ready for build and sample run. This is a private branch -
> > but we can do the same for the pull request (2555), making it the staging
> > branch. However, there are a number of technical details we need to
> address
> > first:
> >
> > 1) The private branch we're creating contains not only the cpp encryption
> > sample, but also an additional interop sample that checks
> interoperability
> > with parquet-mr encryption.
> > Do we want to add this interop sample to the staging branch? Should this
> > sample be merged in arrow/parquet-cpp for future interop testing with
> Java
> > encryption? Or this sample can stay in the staging branch for the
> duration
> > of the review process, and be removed just before the staging PR is
> merged
> > in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> > branch at all, and keep our private branch for interop testing? Let us
> know
> > your preference.
> >
> > 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> > and update the PR. It adds 2 new files, and doesnt touch the existing
> ones
> > - if you guys will be ok with the updated version, you can merge it, and
> > then the staging PR (2555) will automatically pick the 3520 patch upon
> > rebasing to the then-current master state. But, if further changes will
> be
> > required in the #3520 files, we'll add them to the staging PR.
> >
> > Cheers, Gidon.
> >
> > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <[email protected]>
> > wrote:
> >
> > > Hi Gidon and Tham,
> > >
> > > I opened a PR for OpenSSL build support here:
> > > https://github.com/apache/arrow/pull/4384/files
> > > However, your PRs need to be rebased/merged with master.
> > > Why don't we keep one staging PR for C++? Probably make
> > > https://github.com/apache/arrow/pull/2555 the staging PR and
> cherry-pick
> > > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> > it.
> > > The later PRs are tiny anyway. This simplifies code review, syncing
> with
> > > the master, as well as building the encryption work while reviewing.
> > >
> > >
> > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <[email protected]>
> > > wrote:
> > >
> > > > Thanks guys. We'll create a doc with a description of the current
> > interop
> > > > tests, including details on how to build/run them.
> > > >
> > > >
> > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > I will make the first attempt to review the C++ code. I reviewed
> part
> > > of
> > > > it
> > > > > a while ago.
> > > > > Java-C++ inter-operation testing is my main concern. Can you share
> > some
> > > > > details on what has been tested?
> > > > > I need to update myself with the key management part and see if we
> > can
> > > > add
> > > > > some test parquet files written in Java along with the keys to the
> > data
> > > > > folder.
> > > > >
> > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > hi folks,
> > > > > >
> > > > > > I'm extraordinarily busy but it seems likely that the code review
> > for
> > > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> > is a
> > > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > > hasn't been much feedback over the course of its development, I
> > would
> > > > > > like to leave thoughtful code review so that we can avoid issues
> > that
> > > > > > may lead to burdensome maintenance issues.
> > > > > >
> > > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > > that
> > > > > > use OpenSSL and there are many examples of how to incorporate
> other
> > > > > > dependencies into the Arrow build system. I don't have time to
> work
> > > on
> > > > > > this right now so someone else will have to sort out the
> toolchain
> > > > > > issue
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > <re-sending from another account>
> > > > > > >
> > > > > > > Tham, thank you for this! and for volunteering early for the
> C++
> > > > > version
> > > > > > > work, driving it forward and creating the bulk of the
> parquet-cpp
> > > > > > > encryption code along the way.
> > > > > > >
> > > > > > > @All - this announcement means that two implementations of
> > Parquet
> > > > > > > encryption, fully conforming to the formal specification, are
> > > > available
> > > > > > > today. Thanks to Revital for contributing to C++ version
> > compliance
> > > > > with
> > > > > > > the encryption spec, and for running a set of basic Java-C++
> > > > encryption
> > > > > > > interoperability tests. We have tested plaintext and encrypted
> > > footer
> > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers.
> Files
> > > > > written
> > > > > > > with parquet-cpp are successfully parsed by parquet-mr, and
> vice
> > > > versa.
> > > > > > >
> > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> > support
> > > > and
> > > > > > vote
> > > > > > > for the encryption specification - along with the PMC folks.
> > > > > > >
> > > > > > > All parquet-format pull requests are merged by now into the
> > > > encryption
> > > > > > > branch,
> > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > >
> > > > > > > The community is welcome to review the parquet-mr pull
> requests,
> > in
> > > > the
> > > > > > > following order:
> > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > >
> > > > > > >
> > > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > > encryption
> > > > > > is
> > > > > > > collected in this branch:
> > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > Thanks to Xinli for working with this branch code, and
> > contributing
> > > > to
> > > > > it
> > > > > > > based on his field experience. Everybody is welcome to do the
> > same.
> > > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > > instead
> > > > of
> > > > > > my
> > > > > > > private branch..
> > > > > > >
> > > > > > > And I certainly second Tham's call to review & merge the
> > > parquet-cpp
> > > > > pull
> > > > > > > requests. By now, we have a number of companies starting to
> > utilize
> > > > > > Parquet
> > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > >
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <[email protected]>
> wrote:
> > > > > > >
> > > > > > > > Hi community,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > After a long time of development, I'm honor to announce that
> we
> > > > have
> > > > > > just
> > > > > > > > completed C++ parquet encryption module which implements
> > > encryption
> > > > > in
> > > > > > low
> > > > > > > > level api and with examples included.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > To have this feature completed, I would like to thank Gidon
> and
> > > > > > Revital for
> > > > > > > > their contribution.
> > > > > > > >
> > > > > > > > Gidon had a key role in encryption design and in writing Java
> > > > version
> > > > > > code
> > > > > > > > on which we based on to write C++ version. He also wrote
> crypto
> > > > > > package in
> > > > > > > > C++ version.
> > > > > > > >
> > > > > > > > Revital and me has been joining together in writing C++
> > version.
> > > > > > Revital
> > > > > > > > was responsible for AAD calculations, API updating (to be the
> > > same
> > > > > with
> > > > > > > > Java version) and Java-C++ inter-operation testing. I was
> > writing
> > > > the
> > > > > > first
> > > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > > updated
> > > > > > when
> > > > > > > > crypto package change.
> > > > > > > >
> > > > > > > > We have had a great time to cooperate. Thank Gidon and
> Revital
> > > for
> > > > > all
> > > > > > > > guide and experience I have received from them, too.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Here are the links of pull requests:
> > > > > > > >
> > > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > >
> > > > > > > > 2) some merged pull requests for new thrift structure and
> > crypto
> > > > > > algorithm,
> > > > > > > > and one still open:
> https://github.com/apache/arrow/pull/3520
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > However, in order to make (1) buildable with current build
> > > scripts,
> > > > > we
> > > > > > need
> > > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > > this
> > > > > > jira:
> > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > > grateful
> > > > > > if
> > > > > > > > someone could help fullfill this work.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > About current pull requests, they has been currently using in
> > our
> > > > > > > > development phase at Emotiv (https://www.emotiv.com/). We
> love
> > > > using
> > > > > > > > parquet files to store EEG data. We are going to release a
> > > product
> > > > > with
> > > > > > > > encrypted parquet files soon and look forward to the official
> > > > release
> > > > > > of
> > > > > > > > parquet encryption feature. So it will be many thank and
> great
> > > > honor
> > > > > to
> > > > > > > > have you review and merge them (if qualified).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you very much!
> > > > > > > >
> > > > > > > > Tham
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> > >
> >
>
>
> --
> regards,
> Deepak Majeti
>

Reply via email to