+1. The Thrift structures are private to parquet-cpp, so it should not
be an issue
On Wed, Sep 26, 2018 at 12:23 PM Zoltan Ivanfi <z...@cloudera.com.invalid> 
wrote:
>
> Hi,
>
> I just had a conversation with Nandor and he pointed out to me that even if
> we broke _reading_ in parquet-cpp 1.5.0, we could simply release a 1.5.1
> version that fixes it. The important thing is that _writing_ is good and
> parquet-format-compliant in parquet-cpp 1.5.0, therefore we do not have to
> worry about breaking changes that would make existing data files unreadable.
>
> Based on this, I would [once again :)] suggest moving forward as planned.
>
> Thanks,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 4:56 PM Zoltan Ivanfi <z...@cloudera.com> wrote:
>
> > Hi,
> >
> > Please let me know your opinions as well. So far all concerns were only
> > raised by me, which may indicate that other community members do not
> > consider this issue serious and in this case my suggestions may be
> > excessive and unjustified.
> >
> > Just to clarify: The data structures for the encrypion feature are
> > unlikely to change. I solely suggest these measures to remedy the situation
> > of parquet.thrift having been released in parquet-cpp before officially
> > getting released in parquet-format.
> >
> > Thanks,
> >
> > Zoltan
> >
> > On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi <z...@cloudera.com> wrote:
> >
> >> Hi,
> >>
> >> I think it's safer if we skip id 8 altogether and use id 9 for the new
> >> crypto structure. This way we don't have to worry about remaining backwards
> >> compatible with the accidentally released structure.
> >>
> >> Br,
> >>
> >> Zoltan
> >>
> >> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky <gg5...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I think we should use this id for its current purpose. This field had
> >>> been
> >>> defined and merged months ago, and should be there is any scenario.
> >>> Except for the last week's change, the encryption format had been stable
> >>> for a while now. The timing of this change was unfortunate; but the
> >>> change
> >>> was minor and done for a good reason.
> >>> I hope there won't be new disturbances from now on.
> >>>
> >>> Cheers, Gidon.
> >>>
> >>>
> >>> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <z...@cloudera.com.invalid>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > It seems that I spoke too early. I just noticed that a new field was
> >>> added
> >>> > to the ColumnChunk struct in
> >>> >
> >>> >
> >>> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
> >>> > Although the field is optional, we can't pretend it was never released,
> >>> > because parquet-cpp already expects that field for the id 8.
> >>> Therefore, if
> >>> > we were to add a different field with id 8 to the ColumnChunk struct,
> >>> the
> >>> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when
> >>> > reading new files, because it would try to read the ColumnChunk struct
> >>> > including all of its fields (regardless of whether the code actually
> >>> uses
> >>> > them or not) and the field with id 8 would not match its Thrift schema.
> >>> >
> >>> > We still can avoid breaking changes in the ColumnChunk struct by either
> >>> > using id 8 for its current purpose or not using it at all (skip id 8
> >>> and
> >>> > use 9 for the next field we add).
> >>> >
> >>> > Br,
> >>> >
> >>> > Zoltan
> >>> >
> >>> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <z...@cloudera.com> wrote:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > If the encryption code release in parquet-cpp is unused at this
> >>> moment
> >>> > > then I think we are fine. It means that we are still free to decide
> >>> any
> >>> > way
> >>> > > about the data structures without the risk of incompatility issues.
> >>> In
> >>> > this
> >>> > > case I would suggest to proceed as we planned at the Parquet sync.
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Zoltan
> >>> > >
> >>> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg5...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > >> That's correct. A layer that runs the crypto classes to encrypt
> >>> pages
> >>> > and
> >>> > >> structures is not merged yet. Even if the parquet.thrift is out,
> >>> there
> >>> > is
> >>> > >> virtually no chance encrypted files are created with it, and
> >>> certainly
> >>> > not
> >>> > >> in production.
> >>> > >>
> >>> > >> Given a (hopefully mild) headache this last-minute
> >>> feature/requirement
> >>> > has
> >>> > >> caused, let me add details on where it is coming from. I received
> >>> the
> >>> > >> requirement from a company that got an internal pushback from teams
> >>> > >> working
> >>> > >> with unencrypted columns only, and unwilling to update Parquet libs.
> >>> > They
> >>> > >> need support for a transition period for the current readers.
> >>> > >>
> >>> > >> We have an option to say no, and stick to the original goal list.
> >>> But I
> >>> > >> think the requirement makes sense, even if coming very late - and
> >>> will
> >>> > >> significantly ease adoption of the Parquet encryption. Also, it
> >>> turns
> >>> > out
> >>> > >> to be easy to implement, with minor changes in the code.
> >>> > >>
> >>> > >> In any case, its certainly a good idea to keep a single copy of
> >>> > >> parquet.thrift in the format repo.
> >>> > >>
> >>> > >> Cheers, Gidon
> >>> > >>
> >>> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <wesmck...@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> >>> > >> > reading and writing files with encryption is not possible. Am I
> >>> wrong
> >>> > >> > about that?
> >>> > >> >
> >>> > >> > I'm wholly supportive of changing the project to use a released
> >>> > >> > version of the Parquet format, so let's do that ASAP.
> >>> > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <
> >>> gg5...@gmail.com>
> >>> > >> wrote:
> >>> > >> > >
> >>> > >> > > Yep! (sent in parallel :)
> >>> > >> > >
> >>> > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> >>> > <z...@cloudera.com.invalid
> >>> > >> >
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > > > Hi,
> >>> > >> > > >
> >>> > >> > > > As a short update, I just checked the PR for PARQUET-1419 and
> >>> > >> although
> >>> > >> > in
> >>> > >> > > > its current form it is a breaking change, it can be easily
> >>> > >> rewritten to
> >>> > >> > > > become backwards-compatible so this part of the problem does
> >>> not
> >>> > >> apply
> >>> > >> > any
> >>> > >> > > > more.
> >>> > >> > > >
> >>> > >> > > > Br,
> >>> > >> > > >
> >>> > >> > > > Zoltan
> >>> > >> > > >
> >>> > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <
> >>> z...@cloudera.com>
> >>> > >> wrote:
> >>> > >> > > >
> >>> > >> > > > > Hi,
> >>> > >> > > > >
> >>> > >> > > > > On the Parquet sync we discussed that the practice of
> >>> > maintaining
> >>> > >> a
> >>> > >> > copy
> >>> > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we
> >>> must
> >>> > >> take
> >>> > >> > care
> >>> > >> > > > to
> >>> > >> > > > > not release parquet-format changes in parquet-cpp before we
> >>> > >> > officially
> >>> > >> > > > > release them in parquet-format. As I got back to my
> >>> computer and
> >>> > >> > started
> >>> > >> > > > to
> >>> > >> > > > > create a JIRA about this, I noticed that unfortunately this
> >>> has
> >>> > >> > already
> >>> > >> > > > > happened.
> >>> > >> > > > >
> >>> > >> > > > > The encryption-releated parquet.thrift changes have not only
> >>> > been
> >>> > >> > added
> >>> > >> > > > to
> >>> > >> > > > > only parquet-format, but to parquet-cpp as well, and these
> >>> > changes
> >>> > >> > got
> >>> > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate,
> >>> because
> >>> > >> > > > > PARQUET-1419 would change the encryption in a breaking way,
> >>> > which
> >>> > >> is
> >>> > >> > only
> >>> > >> > > > > acceptable as long as the original is not released.
> >>> > Additionally,
> >>> > >> it
> >>> > >> > has
> >>> > >> > > > > been discussed that a formal voting should take place before
> >>> > >> > > > incorporating
> >>> > >> > > > > the encryption features in the format.
> >>> > >> > > > >
> >>> > >> > > > > Now that parquet-cpp has already shipped these changes, we
> >>> must
> >>> > >> > choose
> >>> > >> > > > the
> >>> > >> > > > > lesser evil of the following two options:
> >>> > >> > > > >
> >>> > >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and
> >>> risk
> >>> > >> that
> >>> > >> > > > >    encrypted data files already written with parquet-cpp
> >>> 1.5.0
> >>> > >> will
> >>> > >> > not
> >>> > >> > > > be
> >>> > >> > > > >    readable any more.
> >>> > >> > > > >    - Release the encryption in parquet-format as it is,
> >>> > >> regardless of
> >>> > >> > > > >    voting results and discard PARQUET-1419.
> >>> > >> > > > >
> >>> > >> > > > > Personally I have a hard time deciding which one I consider
> >>> > lesser
> >>> > >> > evil.
> >>> > >> > > > > What are your opinions?
> >>> > >> > > > >
> >>> > >> > > > > Thanks,
> >>> > >> > > > >
> >>> > >> > > > > Zoltan
> >>> > >> > > > >
> >>> > >> > > >
> >>> > >> >
> >>> > >>
> >>> > >
> >>> >
> >>>
> >>

Reply via email to