[ 
https://issues.apache.org/jira/browse/PARQUET-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992782#comment-16992782
 ] 

Kai Liu commented on PARQUET-1155:
----------------------------------

Hi, Machiel,
Did you get any attraction on this issue? And would you mind share the approach 
you are taking to address GDPR requirement in your system now?

Kai Liu

> Support for GDPR erase requirements
> -----------------------------------
>
>                 Key: PARQUET-1155
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1155
>             Project: Parquet
>          Issue Type: Wish
>          Components: parquet-format
>    Affects Versions: 1.8.2
>            Reporter: Machiel Groeneveld
>            Priority: Major
>
> As understand it Parquet is a write once thing. So mutating data inside 
> Parquet files is not an option. Now there is a new cross EU law coming in 
> effect May 2018 that requires companies to delete data pertaining a customer 
> if being asked to do so.
> Our case is quite simple, our biggest parquet tables collect 7.5 billion rows 
> a month. So removing data by duplicating this table whilst filtering out the 
> unwanted customer data is not feasible. 
> Perhaps there is some way to remove particular data? Or perhaps there is an 
> efficient way to do read/filter/write? Perhaps zeroing the data is an idea to 
> not change the layout of the files. 
> Not sure if this is the right platform to start this discussion but I think 
> more people will have this issue once it becomes clear that data needs to be 
> deleted in all places, also in parquet files. Companies fase multi million 
> dollar fines if they don't comply with GDPR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to