[jira] [Updated] (PARQUET-2223) Parquet Data Masking for Column Encryption

Jiashen Zhang (Jira) Wed, 14 Jun 2023 01:03:04 -0700


     [ 
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jiashen Zhang updated PARQUET-2223:
-----------------------------------
    Description: 
h1. Background
h2. What is Data Masking?

Data masking is a technique used to protect sensitive data by replacing it with 
modified or obscured values. The purpose of data masking is to ensure that 
sensitive information, such as Personally Identifiable Information (PII), 
remains hidden from unauthorized users while allowing authorized users to 
perform their tasks.

Here are a few key points about data masking:
 * Protection of Sensitive Data: Data masking helps to safeguard sensitive 
data, such as Social Security numbers, credit card numbers, names, addresses, 
and other personally identifiable information. By applying masking techniques, 
the original values are replaced with fictional or transformed data that 
retains the format and structure but removes any identifiable information.
 * Controlled Access: Data masking enables controlled access to sensitive data. 
Authorized users, typically with appropriate permissions, can access the 
unmasked or original data, while unauthorized users or users without the 
necessary permissions will only see the masked data.
 * Various Masking Techniques: There are different masking techniques 
available, depending on the specific data privacy requirements and use cases. 
Some commonly used techniques include:
 ** Nullification: Replacing original data with NULL values.
 ** Randomization: Replacing sensitive data with randomly generated values.
 ** Substitution: Replacing sensitive data with fictional but realistic values.
 ** Hashing: Transforming sensitive data into irreversible hashed values.
 ** Redaction: Removing or masking specific parts of sensitive data while 
retaining other non-sensitive information.

 * Compliance and Data Privacy: Data masking is often employed to comply with 
data protection regulations and maintain data privacy. By masking sensitive 
data, we can reduce the risk of data breaches and unauthorized access while 
still allowing legitimate users to perform their tasks.
 * Maintaining Data Consistency: Data masking techniques aim to maintain data 
consistency and integrity by ensuring that masked data retains the original 
data's format, structure, and relationships. This allows applications and 
processes that rely on the data to continue functioning correctly.

h2. Why do we need it?

Data masking serves several important purposes and provides numerous benefits. 
Here are some reasons why we need data masking:
 * Data Privacy and Compliance: Data masking helps us comply with data privacy 
regulations such as the General Data Protection Regulation (GDPR) and the 
Health Insurance Portability and Accountability Act (HIPAA). These regulations 
require us to protect sensitive data and ensure that it is only accessible to 
authorized individuals. Data masking enables us to comply with these 
regulations by de-identifying sensitive data.
 * Minimize Data Exposure: By masking sensitive data, we can reduce the risk of 
data breaches and unauthorized access. If a security breach occurs, the exposed 
data will be meaningless to unauthorized users due to the masking. This helps 
protect individuals' privacy and prevents misuse of sensitive information.
 * Secure Testing and Development Environments: Data masking is particularly 
useful in creating secure testing and development environments. By masking 
sensitive data, we can use realistic but fictional data for testing, analysis, 
and development activities without exposing real personal or sensitive 
information.
 * Enhanced Data Sharing: Data masking allows us to share data with external 
parties, such as partners or third-party vendors, while protecting sensitive 
information. Masked data can be shared with confidence, as the original 
sensitive values are replaced with transformed or fictional data.
 * Employee Privacy: Data masking helps protect employee privacy by obfuscating 
sensitive employee information, such as social security numbers or salary 
details, in databases or HR systems. This safeguards employees' personal data 
from unauthorized access or internal misuse.
 * Insider Threat Mitigation: Data masking reduces the risk posed by insider 
threats, where authorized individuals intentionally or accidentally misuse or 
expose sensitive data. By masking data, even individuals with access to the 
data will only see masked or fictional values, limiting the potential damage 
caused by internal security breaches.
 * Flexibility and Granularity: Data masking techniques offer flexibility and 
granularity in selecting the level of masking required for different types of 
data. We can determine the appropriate masking technique based on the 
sensitivity of the data and the specific use case.

Overall, data masking is essential for protecting sensitive data, maintaining 
compliance with regulations, mitigating data breach risks, and enabling secure 
data sharing and testing environments. It plays a crucial role in ensuring data 
privacy and maintaining the trust of individuals whose data is being processed.

  was:
h1. Background
h2. What is Data Masking?

Data masking is the process of obfuscating sensitive data. Instead of revealing 
PII data, masking allows us to return NULLs, hashes or redacted data in its 
place. With data masking, users who are in the correct permission groups can 
retrieve the original data and users without permissions will receive masked 
data.
h2. Why do we need it?
 * Fined-Grained Access Control

h2. Why do we want to enhance data masking?

 

Users might not have all permissions for all columns, existing code doesn’t 
have support for us to skip columns that users don’t have permissions to 
access. This enhancement will add this support so that users can decide to skip 
some columns to avoid decryption error.
h1. Design Requirements
 # Users can skip some columns with a configuration

h1. Proposed solution

Key idea is to modify the request schema by removing skipped columns from the 
schema.


> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
>                 Key: PARQUET-2223
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2223
>             Project: Parquet
>          Issue Type: New Feature
>            Reporter: Jiashen Zhang
>            Priority: Major
>
> h1. Background
> h2. What is Data Masking?
> Data masking is a technique used to protect sensitive data by replacing it 
> with modified or obscured values. The purpose of data masking is to ensure 
> that sensitive information, such as Personally Identifiable Information 
> (PII), remains hidden from unauthorized users while allowing authorized users 
> to perform their tasks.
> Here are a few key points about data masking:
>  * Protection of Sensitive Data: Data masking helps to safeguard sensitive 
> data, such as Social Security numbers, credit card numbers, names, addresses, 
> and other personally identifiable information. By applying masking 
> techniques, the original values are replaced with fictional or transformed 
> data that retains the format and structure but removes any identifiable 
> information.
>  * Controlled Access: Data masking enables controlled access to sensitive 
> data. Authorized users, typically with appropriate permissions, can access 
> the unmasked or original data, while unauthorized users or users without the 
> necessary permissions will only see the masked data.
>  * Various Masking Techniques: There are different masking techniques 
> available, depending on the specific data privacy requirements and use cases. 
> Some commonly used techniques include:
>  ** Nullification: Replacing original data with NULL values.
>  ** Randomization: Replacing sensitive data with randomly generated values.
>  ** Substitution: Replacing sensitive data with fictional but realistic 
> values.
>  ** Hashing: Transforming sensitive data into irreversible hashed values.
>  ** Redaction: Removing or masking specific parts of sensitive data while 
> retaining other non-sensitive information.
>  * Compliance and Data Privacy: Data masking is often employed to comply with 
> data protection regulations and maintain data privacy. By masking sensitive 
> data, we can reduce the risk of data breaches and unauthorized access while 
> still allowing legitimate users to perform their tasks.
>  * Maintaining Data Consistency: Data masking techniques aim to maintain data 
> consistency and integrity by ensuring that masked data retains the original 
> data's format, structure, and relationships. This allows applications and 
> processes that rely on the data to continue functioning correctly.
> h2. Why do we need it?
> Data masking serves several important purposes and provides numerous 
> benefits. Here are some reasons why we need data masking:
>  * Data Privacy and Compliance: Data masking helps us comply with data 
> privacy regulations such as the General Data Protection Regulation (GDPR) and 
> the Health Insurance Portability and Accountability Act (HIPAA). These 
> regulations require us to protect sensitive data and ensure that it is only 
> accessible to authorized individuals. Data masking enables us to comply with 
> these regulations by de-identifying sensitive data.
>  * Minimize Data Exposure: By masking sensitive data, we can reduce the risk 
> of data breaches and unauthorized access. If a security breach occurs, the 
> exposed data will be meaningless to unauthorized users due to the masking. 
> This helps protect individuals' privacy and prevents misuse of sensitive 
> information.
>  * Secure Testing and Development Environments: Data masking is particularly 
> useful in creating secure testing and development environments. By masking 
> sensitive data, we can use realistic but fictional data for testing, 
> analysis, and development activities without exposing real personal or 
> sensitive information.
>  * Enhanced Data Sharing: Data masking allows us to share data with external 
> parties, such as partners or third-party vendors, while protecting sensitive 
> information. Masked data can be shared with confidence, as the original 
> sensitive values are replaced with transformed or fictional data.
>  * Employee Privacy: Data masking helps protect employee privacy by 
> obfuscating sensitive employee information, such as social security numbers 
> or salary details, in databases or HR systems. This safeguards employees' 
> personal data from unauthorized access or internal misuse.
>  * Insider Threat Mitigation: Data masking reduces the risk posed by insider 
> threats, where authorized individuals intentionally or accidentally misuse or 
> expose sensitive data. By masking data, even individuals with access to the 
> data will only see masked or fictional values, limiting the potential damage 
> caused by internal security breaches.
>  * Flexibility and Granularity: Data masking techniques offer flexibility and 
> granularity in selecting the level of masking required for different types of 
> data. We can determine the appropriate masking technique based on the 
> sensitivity of the data and the specific use case.
> Overall, data masking is essential for protecting sensitive data, maintaining 
> compliance with regulations, mitigating data breach risks, and enabling 
> secure data sharing and testing environments. It plays a crucial role in 
> ensuring data privacy and maintaining the trust of individuals whose data is 
> being processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PARQUET-2223) Parquet Data Masking for Column Encryption

Reply via email to