Hi Gidon,
Was there prior discussion on this on the mailing list?  I left a comment
on the document, but it isn't clear to me why this particular use-case
needs to be part of the core parquet library,

Are there motivating use-cases that wouldn't be served by an external
library/application level?

Thanks,
Micah

On Mon, Aug 3, 2020 at 11:20 PM Gidon Gershinsky <[email protected]> wrote:

> Hi all,
>
> Now that the encryption mechanism is mostly complete, we are starting a
> long-term project on  a new security feature on top of encryption. Called
> "data obfuscation",  it combines masking and anonymization of sensitive
> data.
> https://issues.apache.org/jira/browse/PARQUET-1376
>
> On the one hand, a basic masking can be easily implemented on top of
> Parquet, by simply adding columns with masked (hashed, redacted, etc)
> versions of the original column data. On the other hand, if done
> improperly, data masking can leak out the sensitive information. For these
> two reasons, we have decided not to rush it, this feature is not planned
> for the upcoming Parquet versions. Following an initial discussion, we have
> produced a write up on the goals, challenges and possible approaches.
> Before drafting the design, we start with a call to the community to
> provide feedback on this write up (eg via comments inside the doc). Any
> real-life examples, usecases, requirements are very welcome.
>
>
> https://docs.google.com/document/d/1LMs74uhqvMNJacBySPnWq6tM8qIpgcIZz444c7vfibM/edit?usp=sharing
>
>
> Cheers,
> Gidon, Xinli, Shri
>

Reply via email to