Hi Gidon,

I agree with your point 2). I merged the PR for OpenSSL build support. We
can do the same for PR-3520 after you push the changes this week. We can
then simply rebase PR-2555 and make the final review.
For 1) I would like to take a look at the inter-op testing framework first.
Can you share the inter-op sample? Thanks.


On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <[email protected]> wrote:

> Hi Deepak,
>
> If it simplifies the code review, we can make this change. Actually, we're
> already creating a single branch that has the parquet-cpp with encryption
> in one place, ready for build and sample run. This is a private branch -
> but we can do the same for the pull request (2555), making it the staging
> branch. However, there are a number of technical details we need to address
> first:
>
> 1) The private branch we're creating contains not only the cpp encryption
> sample, but also an additional interop sample that checks interoperability
> with parquet-mr encryption.
> Do we want to add this interop sample to the staging branch? Should this
> sample be merged in arrow/parquet-cpp for future interop testing with Java
> encryption? Or this sample can stay in the staging branch for the duration
> of the review process, and be removed just before the staging PR is merged
> in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> branch at all, and keep our private branch for interop testing? Let us know
> your preference.
>
> 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> and update the PR. It adds 2 new files, and doesnt touch the existing ones
> - if you guys will be ok with the updated version, you can merge it, and
> then the staging PR (2555) will automatically pick the 3520 patch upon
> rebasing to the then-current master state. But, if further changes will be
> required in the #3520 files, we'll add them to the staging PR.
>
> Cheers, Gidon.
>
> On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <[email protected]>
> wrote:
>
> > Hi Gidon and Tham,
> >
> > I opened a PR for OpenSSL build support here:
> > https://github.com/apache/arrow/pull/4384/files
> > However, your PRs need to be rebased/merged with master.
> > Why don't we keep one staging PR for C++? Probably make
> > https://github.com/apache/arrow/pull/2555 the staging PR and cherry-pick
> > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> it.
> > The later PRs are tiny anyway. This simplifies code review, syncing with
> > the master, as well as building the encryption work while reviewing.
> >
> >
> > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <[email protected]>
> > wrote:
> >
> > > Thanks guys. We'll create a doc with a description of the current
> interop
> > > tests, including details on how to build/run them.
> > >
> > >
> > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <[email protected]
> >
> > > wrote:
> > >
> > > > I will make the first attempt to review the C++ code. I reviewed part
> > of
> > > it
> > > > a while ago.
> > > > Java-C++ inter-operation testing is my main concern. Can you share
> some
> > > > details on what has been tested?
> > > > I need to update myself with the key management part and see if we
> can
> > > add
> > > > some test parquet files written in Java along with the keys to the
> data
> > > > folder.
> > > >
> > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <[email protected]>
> > > wrote:
> > > >
> > > > > hi folks,
> > > > >
> > > > > I'm extraordinarily busy but it seems likely that the code review
> for
> > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> is a
> > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > hasn't been much feedback over the course of its development, I
> would
> > > > > like to leave thoughtful code review so that we can avoid issues
> that
> > > > > may lead to burdensome maintenance issues.
> > > > >
> > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > that
> > > > > use OpenSSL and there are many examples of how to incorporate other
> > > > > dependencies into the Arrow build system. I don't have time to work
> > on
> > > > > this right now so someone else will have to sort out the toolchain
> > > > > issue
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > <re-sending from another account>
> > > > > >
> > > > > > Tham, thank you for this! and for volunteering early for the C++
> > > > version
> > > > > > work, driving it forward and creating the bulk of the parquet-cpp
> > > > > > encryption code along the way.
> > > > > >
> > > > > > @All - this announcement means that two implementations of
> Parquet
> > > > > > encryption, fully conforming to the formal specification, are
> > > available
> > > > > > today. Thanks to Revital for contributing to C++ version
> compliance
> > > > with
> > > > > > the encryption spec, and for running a set of basic Java-C++
> > > encryption
> > > > > > interoperability tests. We have tested plaintext and encrypted
> > footer
> > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files
> > > > written
> > > > > > with parquet-cpp are successfully parsed by parquet-mr, and vice
> > > versa.
> > > > > >
> > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> support
> > > and
> > > > > vote
> > > > > > for the encryption specification - along with the PMC folks.
> > > > > >
> > > > > > All parquet-format pull requests are merged by now into the
> > > encryption
> > > > > > branch,
> > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > >
> > > > > > The community is welcome to review the parquet-mr pull requests,
> in
> > > the
> > > > > > following order:
> > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > >
> > > > > >
> > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > encryption
> > > > > is
> > > > > > collected in this branch:
> > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > Thanks to Xinli for working with this branch code, and
> contributing
> > > to
> > > > it
> > > > > > based on his field experience. Everybody is welcome to do the
> same.
> > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > instead
> > > of
> > > > > my
> > > > > > private branch..
> > > > > >
> > > > > > And I certainly second Tham's call to review & merge the
> > parquet-cpp
> > > > pull
> > > > > > requests. By now, we have a number of companies starting to
> utilize
> > > > > Parquet
> > > > > > encryption (both C++ and Java), including IBM.
> > > > > >
> > > > > >
> > > > > > Cheers, Gidon.
> > > > > >
> > > > > >
> > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <[email protected]> wrote:
> > > > > >
> > > > > > > Hi community,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After a long time of development, I'm honor to announce that we
> > > have
> > > > > just
> > > > > > > completed C++ parquet encryption module which implements
> > encryption
> > > > in
> > > > > low
> > > > > > > level api and with examples included.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > To have this feature completed, I would like to thank Gidon and
> > > > > Revital for
> > > > > > > their contribution.
> > > > > > >
> > > > > > > Gidon had a key role in encryption design and in writing Java
> > > version
> > > > > code
> > > > > > > on which we based on to write C++ version. He also wrote crypto
> > > > > package in
> > > > > > > C++ version.
> > > > > > >
> > > > > > > Revital and me has been joining together in writing C++
> version.
> > > > > Revital
> > > > > > > was responsible for AAD calculations, API updating (to be the
> > same
> > > > with
> > > > > > > Java version) and Java-C++ inter-operation testing. I was
> writing
> > > the
> > > > > first
> > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > updated
> > > > > when
> > > > > > > crypto package change.
> > > > > > >
> > > > > > > We have had a great time to cooperate. Thank Gidon and Revital
> > for
> > > > all
> > > > > > > guide and experience I have received from them, too.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Here are the links of pull requests:
> > > > > > >
> > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > >
> > > > > > > 2) some merged pull requests for new thrift structure and
> crypto
> > > > > algorithm,
> > > > > > > and one still open: https://github.com/apache/arrow/pull/3520
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > However, in order to make (1) buildable with current build
> > scripts,
> > > > we
> > > > > need
> > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > this
> > > > > jira:
> > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > grateful
> > > > > if
> > > > > > > someone could help fullfill this work.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > About current pull requests, they has been currently using in
> our
> > > > > > > development phase at Emotiv (https://www.emotiv.com/). We love
> > > using
> > > > > > > parquet files to store EEG data. We are going to release a
> > product
> > > > with
> > > > > > > encrypted parquet files soon and look forward to the official
> > > release
> > > > > of
> > > > > > > parquet encryption feature. So it will be many thank and great
> > > honor
> > > > to
> > > > > > > have you review and merge them (if qualified).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you very much!
> > > > > > >
> > > > > > > Tham
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Deepak Majeti
> > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
> >
>


-- 
regards,
Deepak Majeti

Reply via email to