[ 
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733538#comment-17733538
 ] 

Gidon Gershinsky commented on PARQUET-2223:
-------------------------------------------

Yep, I also think so. I'll have a look at the current version of the design 
document.

> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
>                 Key: PARQUET-2223
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2223
>             Project: Parquet
>          Issue Type: New Feature
>            Reporter: Jiashen Zhang
>            Priority: Major
>
> h1. Background
> h2. What is Data Masking?
> Data masking is a technique used to protect sensitive data by replacing it 
> with modified or obscured values. The purpose of data masking is to ensure 
> that sensitive information, such as Personally Identifiable Information 
> (PII), remains hidden from unauthorized users while allowing authorized users 
> to perform their tasks.
> Here are a few key points about data masking:
>  * Protection of Sensitive Data: Data masking helps to safeguard sensitive 
> data, such as Social Security numbers, credit card numbers, names, addresses, 
> and other personally identifiable information. By applying masking 
> techniques, the original values are replaced with fictional or transformed 
> data that retains the format and structure but removes any identifiable 
> information.
>  * Controlled Access: Data masking enables controlled access to sensitive 
> data. Authorized users, typically with appropriate permissions, can access 
> the unmasked or original data, while unauthorized users or users without the 
> necessary permissions will only see the masked data.
>  * Various Masking Techniques: There are different masking techniques 
> available, depending on the specific data privacy requirements and use cases. 
> Some commonly used techniques include:
>  ** Nullification: Replacing original data with NULL values.
>  ** Randomization: Replacing sensitive data with randomly generated values.
>  ** Substitution: Replacing sensitive data with fictional but realistic 
> values.
>  ** Hashing: Transforming sensitive data into irreversible hashed values.
>  ** Redaction: Removing or masking specific parts of sensitive data while 
> retaining other non-sensitive information.
>  * Compliance and Data Privacy: Data masking is often employed to comply with 
> data protection regulations and maintain data privacy. By masking sensitive 
> data, we can reduce the risk of data breaches and unauthorized access while 
> still allowing legitimate users to perform their tasks.
>  * Maintaining Data Consistency: Data masking techniques aim to maintain data 
> consistency and integrity by ensuring that masked data retains the original 
> data's format, structure, and relationships. This allows applications and 
> processes that rely on the data to continue functioning correctly.
> h2. Why do we need it?
> Data masking serves several important purposes and provides numerous 
> benefits. Here are some reasons why we need data masking:
>  * Data Privacy and Compliance: Data masking helps us comply with data 
> privacy regulations such as the General Data Protection Regulation (GDPR) and 
> the Health Insurance Portability and Accountability Act (HIPAA). These 
> regulations require us to protect sensitive data and ensure that it is only 
> accessible to authorized individuals. Data masking enables us to comply with 
> these regulations by de-identifying sensitive data.
>  * Minimize Data Exposure: By masking sensitive data, we can reduce the risk 
> of data breaches and unauthorized access. If a security breach occurs, the 
> exposed data will be meaningless to unauthorized users due to the masking. 
> This helps protect individuals' privacy and prevents misuse of sensitive 
> information.
>  * Secure Testing and Development Environments: Data masking is particularly 
> useful in creating secure testing and development environments. By masking 
> sensitive data, we can use realistic but fictional data for testing, 
> analysis, and development activities without exposing real personal or 
> sensitive information.
>  * Enhanced Data Sharing: Data masking allows us to share data with external 
> parties, such as partners or third-party vendors, while protecting sensitive 
> information. Masked data can be shared with confidence, as the original 
> sensitive values are replaced with transformed or fictional data.
>  * Employee Privacy: Data masking helps protect employee privacy by 
> obfuscating sensitive employee information, such as social security numbers 
> or salary details, in databases or HR systems. This safeguards employees' 
> personal data from unauthorized access or internal misuse.
>  * Insider Threat Mitigation: Data masking reduces the risk posed by insider 
> threats, where authorized individuals intentionally or accidentally misuse or 
> expose sensitive data. By masking data, even individuals with access to the 
> data will only see masked or fictional values, limiting the potential damage 
> caused by internal security breaches.
>  * Flexibility and Granularity: Data masking techniques offer flexibility and 
> granularity in selecting the level of masking required for different types of 
> data. We can determine the appropriate masking technique based on the 
> sensitivity of the data and the specific use case.
> Overall, data masking is essential for protecting sensitive data, maintaining 
> compliance with regulations, mitigating data breach risks, and enabling 
> secure data sharing and testing environments. It plays a crucial role in 
> ensuring data privacy and maintaining the trust of individuals whose data is 
> being processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to