[
https://issues.apache.org/jira/browse/PARQUET-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992782#comment-16992782
]
Kai Liu commented on PARQUET-1155:
----------------------------------
Hi, Machiel,
Did you get any attraction on this issue? And would you mind share the approach
you are taking to address GDPR requirement in your system now?
Kai Liu
> Support for GDPR erase requirements
> -----------------------------------
>
> Key: PARQUET-1155
> URL: https://issues.apache.org/jira/browse/PARQUET-1155
> Project: Parquet
> Issue Type: Wish
> Components: parquet-format
> Affects Versions: 1.8.2
> Reporter: Machiel Groeneveld
> Priority: Major
>
> As understand it Parquet is a write once thing. So mutating data inside
> Parquet files is not an option. Now there is a new cross EU law coming in
> effect May 2018 that requires companies to delete data pertaining a customer
> if being asked to do so.
> Our case is quite simple, our biggest parquet tables collect 7.5 billion rows
> a month. So removing data by duplicating this table whilst filtering out the
> unwanted customer data is not feasible.
> Perhaps there is some way to remove particular data? Or perhaps there is an
> efficient way to do read/filter/write? Perhaps zeroing the data is an idea to
> not change the layout of the files.
> Not sure if this is the right platform to start this discussion but I think
> more people will have this issue once it becomes clear that data needs to be
> deleted in all places, also in parquet files. Companies fase multi million
> dollar fines if they don't comply with GDPR.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)