[
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiashen Zhang updated PARQUET-2223:
-----------------------------------
Description:
h1. Background
h2. What is Data Masking?
Data masking is a technique used to protect sensitive data by replacing it with
modified or obscured values. The purpose of data masking is to ensure that
sensitive information, such as Personally Identifiable Information (PII),
remains hidden from unauthorized users while allowing authorized users to
perform their tasks.
Here are a few key points about data masking:
* Protection of Sensitive Data: Data masking helps to safeguard sensitive
data, such as Social Security numbers, credit card numbers, names, addresses,
and other personally identifiable information. By applying masking techniques,
the original values are replaced with fictional or transformed data that
retains the format and structure but removes any identifiable information.
* Controlled Access: Data masking enables controlled access to sensitive data.
Authorized users, typically with appropriate permissions, can access the
unmasked or original data, while unauthorized users or users without the
necessary permissions will only see the masked data.
* Various Masking Techniques: There are different masking techniques
available, depending on the specific data privacy requirements and use cases.
Some commonly used techniques include:
** Nullification: Replacing original data with NULL values.
** Randomization: Replacing sensitive data with randomly generated values.
** Substitution: Replacing sensitive data with fictional but realistic values.
** Hashing: Transforming sensitive data into irreversible hashed values.
** Redaction: Removing or masking specific parts of sensitive data while
retaining other non-sensitive information.
* Compliance and Data Privacy: Data masking is often employed to comply with
data protection regulations and maintain data privacy. By masking sensitive
data, we can reduce the risk of data breaches and unauthorized access while
still allowing legitimate users to perform their tasks.
* Maintaining Data Consistency: Data masking techniques aim to maintain data
consistency and integrity by ensuring that masked data retains the original
data's format, structure, and relationships. This allows applications and
processes that rely on the data to continue functioning correctly.
h2. Why do we need it?
Data masking serves several important purposes and provides numerous benefits.
Here are some reasons why we need data masking:
* Data Privacy and Compliance: Data masking helps us comply with data privacy
regulations such as the General Data Protection Regulation (GDPR) and the
Health Insurance Portability and Accountability Act (HIPAA). These regulations
require us to protect sensitive data and ensure that it is only accessible to
authorized individuals. Data masking enables us to comply with these
regulations by de-identifying sensitive data.
* Minimize Data Exposure: By masking sensitive data, we can reduce the risk of
data breaches and unauthorized access. If a security breach occurs, the exposed
data will be meaningless to unauthorized users due to the masking. This helps
protect individuals' privacy and prevents misuse of sensitive information.
* Secure Testing and Development Environments: Data masking is particularly
useful in creating secure testing and development environments. By masking
sensitive data, we can use realistic but fictional data for testing, analysis,
and development activities without exposing real personal or sensitive
information.
* Enhanced Data Sharing: Data masking allows us to share data with external
parties, such as partners or third-party vendors, while protecting sensitive
information. Masked data can be shared with confidence, as the original
sensitive values are replaced with transformed or fictional data.
* Employee Privacy: Data masking helps protect employee privacy by obfuscating
sensitive employee information, such as social security numbers or salary
details, in databases or HR systems. This safeguards employees' personal data
from unauthorized access or internal misuse.
* Insider Threat Mitigation: Data masking reduces the risk posed by insider
threats, where authorized individuals intentionally or accidentally misuse or
expose sensitive data. By masking data, even individuals with access to the
data will only see masked or fictional values, limiting the potential damage
caused by internal security breaches.
* Flexibility and Granularity: Data masking techniques offer flexibility and
granularity in selecting the level of masking required for different types of
data. We can determine the appropriate masking technique based on the
sensitivity of the data and the specific use case.
Overall, data masking is essential for protecting sensitive data, maintaining
compliance with regulations, mitigating data breach risks, and enabling secure
data sharing and testing environments. It plays a crucial role in ensuring data
privacy and maintaining the trust of individuals whose data is being processed.
was:
h1. Background
h2. What is Data Masking?
Data masking is the process of obfuscating sensitive data. Instead of revealing
PII data, masking allows us to return NULLs, hashes or redacted data in its
place. With data masking, users who are in the correct permission groups can
retrieve the original data and users without permissions will receive masked
data.
h2. Why do we need it?
* Fined-Grained Access Control
h2. Why do we want to enhance data masking?
Users might not have all permissions for all columns, existing code doesn’t
have support for us to skip columns that users don’t have permissions to
access. This enhancement will add this support so that users can decide to skip
some columns to avoid decryption error.
h1. Design Requirements
# Users can skip some columns with a configuration
h1. Proposed solution
Key idea is to modify the request schema by removing skipped columns from the
schema.
> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
> Key: PARQUET-2223
> URL: https://issues.apache.org/jira/browse/PARQUET-2223
> Project: Parquet
> Issue Type: New Feature
> Reporter: Jiashen Zhang
> Priority: Major
>
> h1. Background
> h2. What is Data Masking?
> Data masking is a technique used to protect sensitive data by replacing it
> with modified or obscured values. The purpose of data masking is to ensure
> that sensitive information, such as Personally Identifiable Information
> (PII), remains hidden from unauthorized users while allowing authorized users
> to perform their tasks.
> Here are a few key points about data masking:
> * Protection of Sensitive Data: Data masking helps to safeguard sensitive
> data, such as Social Security numbers, credit card numbers, names, addresses,
> and other personally identifiable information. By applying masking
> techniques, the original values are replaced with fictional or transformed
> data that retains the format and structure but removes any identifiable
> information.
> * Controlled Access: Data masking enables controlled access to sensitive
> data. Authorized users, typically with appropriate permissions, can access
> the unmasked or original data, while unauthorized users or users without the
> necessary permissions will only see the masked data.
> * Various Masking Techniques: There are different masking techniques
> available, depending on the specific data privacy requirements and use cases.
> Some commonly used techniques include:
> ** Nullification: Replacing original data with NULL values.
> ** Randomization: Replacing sensitive data with randomly generated values.
> ** Substitution: Replacing sensitive data with fictional but realistic
> values.
> ** Hashing: Transforming sensitive data into irreversible hashed values.
> ** Redaction: Removing or masking specific parts of sensitive data while
> retaining other non-sensitive information.
> * Compliance and Data Privacy: Data masking is often employed to comply with
> data protection regulations and maintain data privacy. By masking sensitive
> data, we can reduce the risk of data breaches and unauthorized access while
> still allowing legitimate users to perform their tasks.
> * Maintaining Data Consistency: Data masking techniques aim to maintain data
> consistency and integrity by ensuring that masked data retains the original
> data's format, structure, and relationships. This allows applications and
> processes that rely on the data to continue functioning correctly.
> h2. Why do we need it?
> Data masking serves several important purposes and provides numerous
> benefits. Here are some reasons why we need data masking:
> * Data Privacy and Compliance: Data masking helps us comply with data
> privacy regulations such as the General Data Protection Regulation (GDPR) and
> the Health Insurance Portability and Accountability Act (HIPAA). These
> regulations require us to protect sensitive data and ensure that it is only
> accessible to authorized individuals. Data masking enables us to comply with
> these regulations by de-identifying sensitive data.
> * Minimize Data Exposure: By masking sensitive data, we can reduce the risk
> of data breaches and unauthorized access. If a security breach occurs, the
> exposed data will be meaningless to unauthorized users due to the masking.
> This helps protect individuals' privacy and prevents misuse of sensitive
> information.
> * Secure Testing and Development Environments: Data masking is particularly
> useful in creating secure testing and development environments. By masking
> sensitive data, we can use realistic but fictional data for testing,
> analysis, and development activities without exposing real personal or
> sensitive information.
> * Enhanced Data Sharing: Data masking allows us to share data with external
> parties, such as partners or third-party vendors, while protecting sensitive
> information. Masked data can be shared with confidence, as the original
> sensitive values are replaced with transformed or fictional data.
> * Employee Privacy: Data masking helps protect employee privacy by
> obfuscating sensitive employee information, such as social security numbers
> or salary details, in databases or HR systems. This safeguards employees'
> personal data from unauthorized access or internal misuse.
> * Insider Threat Mitigation: Data masking reduces the risk posed by insider
> threats, where authorized individuals intentionally or accidentally misuse or
> expose sensitive data. By masking data, even individuals with access to the
> data will only see masked or fictional values, limiting the potential damage
> caused by internal security breaches.
> * Flexibility and Granularity: Data masking techniques offer flexibility and
> granularity in selecting the level of masking required for different types of
> data. We can determine the appropriate masking technique based on the
> sensitivity of the data and the specific use case.
> Overall, data masking is essential for protecting sensitive data, maintaining
> compliance with regulations, mitigating data breach risks, and enabling
> secure data sharing and testing environments. It plays a crucial role in
> ensuring data privacy and maintaining the trust of individuals whose data is
> being processed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)