Hi all, Now that the encryption mechanism is mostly complete, we are starting a long-term project on a new security feature on top of encryption. Called "data obfuscation", it combines masking and anonymization of sensitive data. https://issues.apache.org/jira/browse/PARQUET-1376
On the one hand, a basic masking can be easily implemented on top of Parquet, by simply adding columns with masked (hashed, redacted, etc) versions of the original column data. On the other hand, if done improperly, data masking can leak out the sensitive information. For these two reasons, we have decided not to rush it, this feature is not planned for the upcoming Parquet versions. Following an initial discussion, we have produced a write up on the goals, challenges and possible approaches. Before drafting the design, we start with a call to the community to provide feedback on this write up (eg via comments inside the doc). Any real-life examples, usecases, requirements are very welcome. https://docs.google.com/document/d/1LMs74uhqvMNJacBySPnWq6tM8qIpgcIZz444c7vfibM/edit?usp=sharing Cheers, Gidon, Xinli, Shri
