[jira] [Commented] (PARQUET-2223) Parquet Data Masking for Column Encryption

ASF GitHub Bot (Jira) Sun, 15 Jan 2023 22:29:04 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17677173#comment-17677173
 ]


ASF GitHub Bot commented on PARQUET-2223:
-----------------------------------------

ggershinsky commented on PR #1016:
URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1383552737

   As far as I understand, _data masking_ replaces content of sensitive 
columns; it does not remove the columns (schema and content). The latter is 
done by _column pruning_ - when re-writing a file. All of that is not related 
to _column encryption_. So I'm not fully sure what is the goal of the mechanism 
in this PR. Maybe we can start with a googledoc that describes the problem, the 
goals and the solution design?




> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
>                 Key: PARQUET-2223
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2223
>             Project: Parquet
>          Issue Type: Task
>            Reporter: Jiashen Zhang
>            Priority: Minor
>
> h1. Background
> h2. What is Data Masking?
> Data masking is the process of obfuscating sensitive data. Instead of 
> revealing PII data, masking allows us to return NULLs, hashes or redacted 
> data in its place. With data masking, users who are in the correct permission 
> groups can retrieve the original data and users without permissions will 
> receive masked data.
> h2. Why do we need it?
>  * Fined-Grained Access Control
> h2. Why do we want to enhance data masking?
>  
> Users might not have all permissions for all columns, existing code doesn’t 
> have support for us to skip columns that users don’t have permissions to 
> access. This enhancement will add this support so that users can decide to 
> skip some columns to avoid decryption error.
> h1. Design Requirements
>  # Users can skip some columns with a configuration
> h1. Proposed solution
> Key idea is to modify the request schema by removing skipped columns from the 
> schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2223) Parquet Data Masking for Column Encryption

Reply via email to