Attendees: Gidon, Gabor, Fokko, Xu, Sri, Xinli
1.
Column Encryption
1.
PR 800 <https://github.com/apache/parquet-mr/pull/800> - This is to
merge to master and it is being reviewed.
1.
One comment is about CRC. Since the encryption algorithm AES-GCM
already has an integration check, doing CRC is redundant.
2.
The behavior “CRC is enabled by default in writing path” will not
be changed even when AES-GCM is used. This is because CRC calculation
overhead is very small according to our earlier tests, and changing
behavior may break something.
2.
The PR <https://github.com/apache/parquet-mr/pull/801> for
Parquet-1396 will be moved to the master branch after PR 800
<https://github.com/apache/parquet-mr/pull/800> is done.
2.
Parquet 1.11.1 release.
1.
Additional fix(Parquet-1684
<https://issues.apache.org/jira/browse/PARQUET-1684>) to be added?
The conclusion is no after the discussion. This is not a regression in
Parquet11 and the change itself is not low risk.
2.
Rolling out the Spark is still blocked. But downgrading the Avro
version in Parquet is not an option.
3.
Parquet 12 release
1.
After encryption is done, Gabor will create a Jira to start the
process.
4.
Proposal for CompressionCodec Provider-aware Compression Codec (doc
<https://docs.google.com/document/d/1ueSYq2FIzaom23cpHXppig93ylOxe8CU6EwS82dov2E/edit#heading=h.5b2qz2ba32wm>
)
1.
PR-803 <https://github.com/apache/parquet-mr/pull/803> need to be
reviewed
5.
Data masking
1.
After column encryption is done in master, we(Xinli, Gidon, and Sri)
will start the conversation.
Please let me know if you have any questions.
---
Xinli | Uber Data Infra Team