and sorry for the EH, PH typos in a couple of places, should've been EF, PF.

On Tue, Sep 18, 2018 at 11:19 AM Zoltan Ivanfi <[email protected]>
wrote:

> Hi,
>
> Just to clarify: PF~ allows older readers to read data as long as they only
> try to access unencrypted columns. What happens when older readers do try
> to access encrypted columns?
>
> Also, by older readers do you specificially mean the current Java library
> or all existing language bindings?
>
> Thanks,
>
> Zoltan
>
> On Tue, Sep 18, 2018 at 9:45 AM Gidon Gershinsky <[email protected]> wrote:
>
> > Hi all,
> >
> > This week, 8 months after the first call for goals feedback and
> > requirements :), I got a new one - enabling old Parquet readers to access
> > data of unencrypted columns in encrypted files.
> > Better late than never.. But actually it doesn't sound unreasonable, and
> > deserved at least a consideration.
> >
> > Let me describe the options (the way I see them). Any community feedback
> is
> > welcome.
> >
> > But first, a little tech intro. Encrypted Parquet files can be created in
> > two modes - with an encrypted footer (lets call this an 'EF' mode for the
> > purpose of this discussion), or with a plaintext footer ('PF' mode).
> > EF is significantly more secure - it protects all data and metadata in a
> > file, including the schema, number of rows, key-value properties, column
> > names, column sort order, list of encrypted columns and metadata of the
> > column encryption keys.
> > PF hides the data, but leaks all of these metadata fields. Moreover, EF
> > makes the footer tamper-proof, while PF doesn't.
> > The reason we have the PF option is to let users with relaxed security
> > requirements to enable readers, that don't have access to any keys, to
> read
> > unencrypted columns in a file.
> >
> > For encrypted columns, both EH and PH hide the ColumnMetaData - including
> > the min/max stats, number of values, data offset, data size and other
> > fields. Old Parquet readers obviously can't read EF files. But they can't
> > also read PF files - because old readers need access to data offset and
> > size of every column in a file, event if they try to read just one column
> > (this is fixed in an encryption pull request).
> >
> > Now, the options:
> >
> > 1) Don't allow old Parquet readers to read encrypted files. Organizations
> > that start working with encrypted data, will update their analytic
> > frameworks to use an encrypting Parquet version. This includes both
> > frameworks that write/read encrypted columns, and frameworks that work
> only
> > with unencrypted columns. The former and latter can technically be the
> same
> > framework, just different instances of it. The update can be done in one
> of
> > the following ways:
> > a. Upgrade Parquet version to the latest one, supporting encryption. This
> > might require some changes in framework code, unrelated to encryption.
> > b. Use the original old Parquet version, with an added encryption support
> > (requires rebuilding the framework, no code changes). This is not hard,
> I'm
> > doing it for Parquet 1.8.2 in order to build and run Spark 2.3.0 with
> > encrypted data.
> > I think I can post this for 1.8.2 and other versions, with some help from
> > the community.
> >
> > 2) Replace PF with PF~, in order to allow old Parquet readers to read
> > unencrypted columns in encrypted files. PF~ is a little less secure and a
> > little less elegant version of PF. Less secure because it has to expose
> the
> > offset and size of encrypted column data. But actually its not
> > catastrophic, and in any case, organizations with higher security
> > requirements will use the EF mode. Others can start with PF~ for a
> > transition period, and switch to EF later.
> > PH~ requires changing 2 lines in the parquet.thrift file, and a few dozen
> > lines in the implementation. I've played with this today, seems quite
> > feasible.
> > So, unless the community strongly favors option 1, I'm inclined to
> proceed
> > with 2, should take up to a week to get the prs submitted.
> >
> > Cheers, Gidon.
> >
>

Reply via email to