Hi Gidon, Was there prior discussion on this on the mailing list? I left a comment on the document, but it isn't clear to me why this particular use-case needs to be part of the core parquet library,
Are there motivating use-cases that wouldn't be served by an external library/application level? Thanks, Micah On Mon, Aug 3, 2020 at 11:20 PM Gidon Gershinsky <[email protected]> wrote: > Hi all, > > Now that the encryption mechanism is mostly complete, we are starting a > long-term project on a new security feature on top of encryption. Called > "data obfuscation", it combines masking and anonymization of sensitive > data. > https://issues.apache.org/jira/browse/PARQUET-1376 > > On the one hand, a basic masking can be easily implemented on top of > Parquet, by simply adding columns with masked (hashed, redacted, etc) > versions of the original column data. On the other hand, if done > improperly, data masking can leak out the sensitive information. For these > two reasons, we have decided not to rush it, this feature is not planned > for the upcoming Parquet versions. Following an initial discussion, we have > produced a write up on the goals, challenges and possible approaches. > Before drafting the design, we start with a call to the community to > provide feedback on this write up (eg via comments inside the doc). Any > real-life examples, usecases, requirements are very welcome. > > > https://docs.google.com/document/d/1LMs74uhqvMNJacBySPnWq6tM8qIpgcIZz444c7vfibM/edit?usp=sharing > > > Cheers, > Gidon, Xinli, Shri >
