Hi,

I think we should use this id for its current purpose. This field had been
defined and merged months ago, and should be there is any scenario.
Except for the last week's change, the encryption format had been stable
for a while now. The timing of this change was unfortunate; but the change
was minor and done for a good reason.
I hope there won't be new disturbances from now on.

Cheers, Gidon.


On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <z...@cloudera.com.invalid>
wrote:

> Hi,
>
> It seems that I spoke too early. I just noticed that a new field was added
> to the ColumnChunk struct in
>
> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
> Although the field is optional, we can't pretend it was never released,
> because parquet-cpp already expects that field for the id 8. Therefore, if
> we were to add a different field with id 8 to the ColumnChunk struct, the
> thrift parser of parquet-cpp 1.5.0 would probably throw an error when
> reading new files, because it would try to read the ColumnChunk struct
> including all of its fields (regardless of whether the code actually uses
> them or not) and the field with id 8 would not match its Thrift schema.
>
> We still can avoid breaking changes in the ColumnChunk struct by either
> using id 8 for its current purpose or not using it at all (skip id 8 and
> use 9 for the next field we add).
>
> Br,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <z...@cloudera.com> wrote:
>
> > Hi,
> >
> > If the encryption code release in parquet-cpp is unused at this moment
> > then I think we are fine. It means that we are still free to decide any
> way
> > about the data structures without the risk of incompatility issues. In
> this
> > case I would suggest to proceed as we planned at the Parquet sync.
> >
> > Thanks,
> >
> > Zoltan
> >
> > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg5...@gmail.com>
> wrote:
> >
> >> That's correct. A layer that runs the crypto classes to encrypt pages
> and
> >> structures is not merged yet. Even if the parquet.thrift is out, there
> is
> >> virtually no chance encrypted files are created with it, and certainly
> not
> >> in production.
> >>
> >> Given a (hopefully mild) headache this last-minute feature/requirement
> has
> >> caused, let me add details on where it is coming from. I received the
> >> requirement from a company that got an internal pushback from teams
> >> working
> >> with unencrypted columns only, and unwilling to update Parquet libs.
> They
> >> need support for a transition period for the current readers.
> >>
> >> We have an option to say no, and stick to the original goal list. But I
> >> think the requirement makes sense, even if coming very late - and will
> >> significantly ease adoption of the Parquet encryption. Also, it turns
> out
> >> to be easy to implement, with minor changes in the code.
> >>
> >> In any case, its certainly a good idea to keep a single copy of
> >> parquet.thrift in the format repo.
> >>
> >> Cheers, Gidon
> >>
> >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <wesmck...@gmail.com>
> >> wrote:
> >>
> >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> >> > reading and writing files with encryption is not possible. Am I wrong
> >> > about that?
> >> >
> >> > I'm wholly supportive of changing the project to use a released
> >> > version of the Parquet format, so let's do that ASAP.
> >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg5...@gmail.com>
> >> wrote:
> >> > >
> >> > > Yep! (sent in parallel :)
> >> > >
> >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> <z...@cloudera.com.invalid
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > As a short update, I just checked the PR for PARQUET-1419 and
> >> although
> >> > in
> >> > > > its current form it is a breaking change, it can be easily
> >> rewritten to
> >> > > > become backwards-compatible so this part of the problem does not
> >> apply
> >> > any
> >> > > > more.
> >> > > >
> >> > > > Br,
> >> > > >
> >> > > > Zoltan
> >> > > >
> >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <z...@cloudera.com>
> >> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > On the Parquet sync we discussed that the practice of
> maintaining
> >> a
> >> > copy
> >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must
> >> take
> >> > care
> >> > > > to
> >> > > > > not release parquet-format changes in parquet-cpp before we
> >> > officially
> >> > > > > release them in parquet-format. As I got back to my computer and
> >> > started
> >> > > > to
> >> > > > > create a JIRA about this, I noticed that unfortunately this has
> >> > already
> >> > > > > happened.
> >> > > > >
> >> > > > > The encryption-releated parquet.thrift changes have not only
> been
> >> > added
> >> > > > to
> >> > > > > only parquet-format, but to parquet-cpp as well, and these
> changes
> >> > got
> >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because
> >> > > > > PARQUET-1419 would change the encryption in a breaking way,
> which
> >> is
> >> > only
> >> > > > > acceptable as long as the original is not released.
> Additionally,
> >> it
> >> > has
> >> > > > > been discussed that a formal voting should take place before
> >> > > > incorporating
> >> > > > > the encryption features in the format.
> >> > > > >
> >> > > > > Now that parquet-cpp has already shipped these changes, we must
> >> > choose
> >> > > > the
> >> > > > > lesser evil of the following two options:
> >> > > > >
> >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk
> >> that
> >> > > > >    encrypted data files already written with parquet-cpp 1.5.0
> >> will
> >> > not
> >> > > > be
> >> > > > >    readable any more.
> >> > > > >    - Release the encryption in parquet-format as it is,
> >> regardless of
> >> > > > >    voting results and discard PARQUET-1419.
> >> > > > >
> >> > > > > Personally I have a hard time deciding which one I consider
> lesser
> >> > evil.
> >> > > > > What are your opinions?
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Zoltan
> >> > > > >
> >> > > >
> >> >
> >>
> >
>

Reply via email to