I will make the first attempt to review the C++ code. I reviewed part of it
a while ago.
Java-C++ inter-operation testing is my main concern. Can you share some
details on what has been tested?
I need to update myself with the key management part and see if we can add
some test parquet files written in Java along with the keys to the data
folder.

On Wed, May 22, 2019 at 10:18 AM Wes McKinney <[email protected]> wrote:

> hi folks,
>
> I'm extraordinarily busy but it seems likely that the code review for
> Parquet C++ is largely going to fall on my shoulders. Since this is a
> large patch with many touch-points in the C++ codebase, and there
> hasn't been much feedback over the course of its development, I would
> like to leave thoughtful code review so that we can avoid issues that
> may lead to burdensome maintenance issues.
>
> On the OpenSSL issue -- there are many CMake projects in the wild that
> use OpenSSL and there are many examples of how to incorporate other
> dependencies into the Arrow build system. I don't have time to work on
> this right now so someone else will have to sort out the toolchain
> issue
>
> Thanks
>
> On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <[email protected]> wrote:
> >
> > <re-sending from another account>
> >
> > Tham, thank you for this! and for volunteering early for the C++ version
> > work, driving it forward and creating the bulk of the parquet-cpp
> > encryption code along the way.
> >
> > @All - this announcement means that two implementations of Parquet
> > encryption, fully conforming to the formal specification, are available
> > today. Thanks to Revital for contributing to C++ version compliance with
> > the encryption spec, and for running a set of basic Java-C++ encryption
> > interoperability tests. We have tested plaintext and encrypted footer
> > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files written
> > with parquet-cpp are successfully parsed by parquet-mr, and vice versa.
> >
> > Let me also thank Junjie, Nandor, Anna and Xinli for their support and
> vote
> > for the encryption specification - along with the PMC folks.
> >
> > All parquet-format pull requests are merged by now into the encryption
> > branch,
> > https://github.com/apache/parquet-format/tree/encryption
> >
> > The community is welcome to review the parquet-mr pull requests, in the
> > following order:
> > https://github.com/apache/parquet-mr/pull/613
> > https://github.com/apache/parquet-mr/pull/614
> > https://github.com/apache/parquet-mr/pull/643
> >
> >
> > Currently, an end-to-end implementation of Java (mr) Parquet encryption
> is
> > collected in this branch:
> > https://github.com/ggershinsky/parquet-mr/tree/encr
> > Thanks to Xinli for working with this branch code, and contributing to it
> > based on his field experience. Everybody is welcome to do the same.
> > @All - it would be helpful to review & merge the above PRs in
> > apache/parquet-mr/encryption, so that folks can work with it instead of
> my
> > private branch..
> >
> > And I certainly second Tham's call to review & merge the parquet-cpp pull
> > requests. By now, we have a number of companies starting to utilize
> Parquet
> > encryption (both C++ and Java), including IBM.
> >
> >
> > Cheers, Gidon.
> >
> >
> > On Mon, May 20, 2019 at 1:40 PM Tham Ha <[email protected]> wrote:
> >
> > > Hi community,
> > >
> > >
> > >
> > > After a long time of development, I'm honor to announce that we have
> just
> > > completed C++ parquet encryption module which implements encryption in
> low
> > > level api and with examples included.
> > >
> > >
> > >
> > > To have this feature completed, I would like to thank Gidon and
> Revital for
> > > their contribution.
> > >
> > > Gidon had a key role in encryption design and in writing Java version
> code
> > > on which we based on to write C++ version. He also wrote crypto
> package in
> > > C++ version.
> > >
> > > Revital and me has been joining together in writing C++ version.
> Revital
> > > was responsible for AAD calculations, API updating (to be the same with
> > > Java version) and Java-C++ inter-operation testing. I was writing the
> first
> > > draft (properties, metadata, writer, reader) and keeping them updated
> when
> > > crypto package change.
> > >
> > > We have had a great time to cooperate. Thank Gidon and Revital for all
> > > guide and experience I have received from them, too.
> > >
> > >
> > >
> > > Here are the links of pull requests:
> > >
> > > 1) encryption module (properties, metadata, writer, reader):
> > > https://github.com/apache/arrow/pull/2555.
> > >
> > > 2) some merged pull requests for new thrift structure and crypto
> algorithm,
> > > and one still open: https://github.com/apache/arrow/pull/3520
> > >
> > >
> > >
> > > However, in order to make (1) buildable with current build scripts, we
> need
> > > “adding openssl in C++ build toolchain” which is mentioned in this
> jira:
> > > https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful
> if
> > > someone could help fullfill this work.
> > >
> > >
> > >
> > > About current pull requests, they has been currently using in our
> > > development phase at Emotiv (https://www.emotiv.com/). We love using
> > > parquet files to store EEG data. We are going to release a product with
> > > encrypted parquet files soon and look forward to the official release
> of
> > > parquet encryption feature. So it will be many thank and great honor to
> > > have you review and merge them (if qualified).
> > >
> > >
> > >
> > > Thank you very much!
> > >
> > > Tham
> > >
>


-- 
regards,
Deepak Majeti

Reply via email to