[
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733538#comment-17733538
]
Gidon Gershinsky commented on PARQUET-2223:
-------------------------------------------
Yep, I also think so. I'll have a look at the current version of the design
document.
> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
> Key: PARQUET-2223
> URL: https://issues.apache.org/jira/browse/PARQUET-2223
> Project: Parquet
> Issue Type: New Feature
> Reporter: Jiashen Zhang
> Priority: Major
>
> h1. Background
> h2. What is Data Masking?
> Data masking is a technique used to protect sensitive data by replacing it
> with modified or obscured values. The purpose of data masking is to ensure
> that sensitive information, such as Personally Identifiable Information
> (PII), remains hidden from unauthorized users while allowing authorized users
> to perform their tasks.
> Here are a few key points about data masking:
> * Protection of Sensitive Data: Data masking helps to safeguard sensitive
> data, such as Social Security numbers, credit card numbers, names, addresses,
> and other personally identifiable information. By applying masking
> techniques, the original values are replaced with fictional or transformed
> data that retains the format and structure but removes any identifiable
> information.
> * Controlled Access: Data masking enables controlled access to sensitive
> data. Authorized users, typically with appropriate permissions, can access
> the unmasked or original data, while unauthorized users or users without the
> necessary permissions will only see the masked data.
> * Various Masking Techniques: There are different masking techniques
> available, depending on the specific data privacy requirements and use cases.
> Some commonly used techniques include:
> ** Nullification: Replacing original data with NULL values.
> ** Randomization: Replacing sensitive data with randomly generated values.
> ** Substitution: Replacing sensitive data with fictional but realistic
> values.
> ** Hashing: Transforming sensitive data into irreversible hashed values.
> ** Redaction: Removing or masking specific parts of sensitive data while
> retaining other non-sensitive information.
> * Compliance and Data Privacy: Data masking is often employed to comply with
> data protection regulations and maintain data privacy. By masking sensitive
> data, we can reduce the risk of data breaches and unauthorized access while
> still allowing legitimate users to perform their tasks.
> * Maintaining Data Consistency: Data masking techniques aim to maintain data
> consistency and integrity by ensuring that masked data retains the original
> data's format, structure, and relationships. This allows applications and
> processes that rely on the data to continue functioning correctly.
> h2. Why do we need it?
> Data masking serves several important purposes and provides numerous
> benefits. Here are some reasons why we need data masking:
> * Data Privacy and Compliance: Data masking helps us comply with data
> privacy regulations such as the General Data Protection Regulation (GDPR) and
> the Health Insurance Portability and Accountability Act (HIPAA). These
> regulations require us to protect sensitive data and ensure that it is only
> accessible to authorized individuals. Data masking enables us to comply with
> these regulations by de-identifying sensitive data.
> * Minimize Data Exposure: By masking sensitive data, we can reduce the risk
> of data breaches and unauthorized access. If a security breach occurs, the
> exposed data will be meaningless to unauthorized users due to the masking.
> This helps protect individuals' privacy and prevents misuse of sensitive
> information.
> * Secure Testing and Development Environments: Data masking is particularly
> useful in creating secure testing and development environments. By masking
> sensitive data, we can use realistic but fictional data for testing,
> analysis, and development activities without exposing real personal or
> sensitive information.
> * Enhanced Data Sharing: Data masking allows us to share data with external
> parties, such as partners or third-party vendors, while protecting sensitive
> information. Masked data can be shared with confidence, as the original
> sensitive values are replaced with transformed or fictional data.
> * Employee Privacy: Data masking helps protect employee privacy by
> obfuscating sensitive employee information, such as social security numbers
> or salary details, in databases or HR systems. This safeguards employees'
> personal data from unauthorized access or internal misuse.
> * Insider Threat Mitigation: Data masking reduces the risk posed by insider
> threats, where authorized individuals intentionally or accidentally misuse or
> expose sensitive data. By masking data, even individuals with access to the
> data will only see masked or fictional values, limiting the potential damage
> caused by internal security breaches.
> * Flexibility and Granularity: Data masking techniques offer flexibility and
> granularity in selecting the level of masking required for different types of
> data. We can determine the appropriate masking technique based on the
> sensitivity of the data and the specific use case.
> Overall, data masking is essential for protecting sensitive data, maintaining
> compliance with regulations, mitigating data breach risks, and enabling
> secure data sharing and testing environments. It plays a crucial role in
> ensuring data privacy and maintaining the trust of individuals whose data is
> being processed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)