Hi Gidon and Tham,

I opened a PR for OpenSSL build support here:
https://github.com/apache/arrow/pull/4384/files
However, your PRs need to be rebased/merged with master.
Why don't we keep one staging PR for C++? Probably make
https://github.com/apache/arrow/pull/2555 the staging PR and cherry-pick
https://github.com/apache/arrow/pull/3520/files and the above PR onto it.
The later PRs are tiny anyway. This simplifies code review, syncing with
the master, as well as building the encryption work while reviewing.


On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <[email protected]> wrote:

> Thanks guys. We'll create a doc with a description of the current interop
> tests, including details on how to build/run them.
>
>
> On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <[email protected]>
> wrote:
>
> > I will make the first attempt to review the C++ code. I reviewed part of
> it
> > a while ago.
> > Java-C++ inter-operation testing is my main concern. Can you share some
> > details on what has been tested?
> > I need to update myself with the key management part and see if we can
> add
> > some test parquet files written in Java along with the keys to the data
> > folder.
> >
> > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <[email protected]>
> wrote:
> >
> > > hi folks,
> > >
> > > I'm extraordinarily busy but it seems likely that the code review for
> > > Parquet C++ is largely going to fall on my shoulders. Since this is a
> > > large patch with many touch-points in the C++ codebase, and there
> > > hasn't been much feedback over the course of its development, I would
> > > like to leave thoughtful code review so that we can avoid issues that
> > > may lead to burdensome maintenance issues.
> > >
> > > On the OpenSSL issue -- there are many CMake projects in the wild that
> > > use OpenSSL and there are many examples of how to incorporate other
> > > dependencies into the Arrow build system. I don't have time to work on
> > > this right now so someone else will have to sort out the toolchain
> > > issue
> > >
> > > Thanks
> > >
> > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <[email protected]>
> > wrote:
> > > >
> > > > <re-sending from another account>
> > > >
> > > > Tham, thank you for this! and for volunteering early for the C++
> > version
> > > > work, driving it forward and creating the bulk of the parquet-cpp
> > > > encryption code along the way.
> > > >
> > > > @All - this announcement means that two implementations of Parquet
> > > > encryption, fully conforming to the formal specification, are
> available
> > > > today. Thanks to Revital for contributing to C++ version compliance
> > with
> > > > the encryption spec, and for running a set of basic Java-C++
> encryption
> > > > interoperability tests. We have tested plaintext and encrypted footer
> > > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files
> > written
> > > > with parquet-cpp are successfully parsed by parquet-mr, and vice
> versa.
> > > >
> > > > Let me also thank Junjie, Nandor, Anna and Xinli for their support
> and
> > > vote
> > > > for the encryption specification - along with the PMC folks.
> > > >
> > > > All parquet-format pull requests are merged by now into the
> encryption
> > > > branch,
> > > > https://github.com/apache/parquet-format/tree/encryption
> > > >
> > > > The community is welcome to review the parquet-mr pull requests, in
> the
> > > > following order:
> > > > https://github.com/apache/parquet-mr/pull/613
> > > > https://github.com/apache/parquet-mr/pull/614
> > > > https://github.com/apache/parquet-mr/pull/643
> > > >
> > > >
> > > > Currently, an end-to-end implementation of Java (mr) Parquet
> encryption
> > > is
> > > > collected in this branch:
> > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > Thanks to Xinli for working with this branch code, and contributing
> to
> > it
> > > > based on his field experience. Everybody is welcome to do the same.
> > > > @All - it would be helpful to review & merge the above PRs in
> > > > apache/parquet-mr/encryption, so that folks can work with it instead
> of
> > > my
> > > > private branch..
> > > >
> > > > And I certainly second Tham's call to review & merge the parquet-cpp
> > pull
> > > > requests. By now, we have a number of companies starting to utilize
> > > Parquet
> > > > encryption (both C++ and Java), including IBM.
> > > >
> > > >
> > > > Cheers, Gidon.
> > > >
> > > >
> > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <[email protected]> wrote:
> > > >
> > > > > Hi community,
> > > > >
> > > > >
> > > > >
> > > > > After a long time of development, I'm honor to announce that we
> have
> > > just
> > > > > completed C++ parquet encryption module which implements encryption
> > in
> > > low
> > > > > level api and with examples included.
> > > > >
> > > > >
> > > > >
> > > > > To have this feature completed, I would like to thank Gidon and
> > > Revital for
> > > > > their contribution.
> > > > >
> > > > > Gidon had a key role in encryption design and in writing Java
> version
> > > code
> > > > > on which we based on to write C++ version. He also wrote crypto
> > > package in
> > > > > C++ version.
> > > > >
> > > > > Revital and me has been joining together in writing C++ version.
> > > Revital
> > > > > was responsible for AAD calculations, API updating (to be the same
> > with
> > > > > Java version) and Java-C++ inter-operation testing. I was writing
> the
> > > first
> > > > > draft (properties, metadata, writer, reader) and keeping them
> updated
> > > when
> > > > > crypto package change.
> > > > >
> > > > > We have had a great time to cooperate. Thank Gidon and Revital for
> > all
> > > > > guide and experience I have received from them, too.
> > > > >
> > > > >
> > > > >
> > > > > Here are the links of pull requests:
> > > > >
> > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > https://github.com/apache/arrow/pull/2555.
> > > > >
> > > > > 2) some merged pull requests for new thrift structure and crypto
> > > algorithm,
> > > > > and one still open: https://github.com/apache/arrow/pull/3520
> > > > >
> > > > >
> > > > >
> > > > > However, in order to make (1) buildable with current build scripts,
> > we
> > > need
> > > > > “adding openssl in C++ build toolchain” which is mentioned in this
> > > jira:
> > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> grateful
> > > if
> > > > > someone could help fullfill this work.
> > > > >
> > > > >
> > > > >
> > > > > About current pull requests, they has been currently using in our
> > > > > development phase at Emotiv (https://www.emotiv.com/). We love
> using
> > > > > parquet files to store EEG data. We are going to release a product
> > with
> > > > > encrypted parquet files soon and look forward to the official
> release
> > > of
> > > > > parquet encryption feature. So it will be many thank and great
> honor
> > to
> > > > > have you review and merge them (if qualified).
> > > > >
> > > > >
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > > Tham
> > > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
> >
>


-- 
regards,
Deepak Majeti

Reply via email to