[
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738130#comment-17738130
]
ASF GitHub Bot commented on PARQUET-2223:
-----------------------------------------
shangxinli commented on code in PR #1112:
URL: https://github.com/apache/parquet-mr/pull/1112#discussion_r1245235188
##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/HiddenColumnException.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto;
+
+import java.util.Arrays;
+
+import org.apache.parquet.ParquetRuntimeException;
+
+/**
+ * Reader doesn't have key for encrypted column,
+ * but tries to access its contents
+ *
+ */
+public class HiddenColumnException extends ParquetRuntimeException {
+ private static final long serialVersionUID = 1L;
+ private static final String configHelpText = " If returning 'null' for
encrypted columns is acceptable," +
+ " please add 'set parquet.crypto.read.masked.value.enabled=true;' before
your query" +
Review Comment:
You can reword it like the user doesn't have permission. If return null for
this non-permitted column is acceptable, set config
'parquet.crypto.read.masked.value.enabled' to 'true'
> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
> Key: PARQUET-2223
> URL: https://issues.apache.org/jira/browse/PARQUET-2223
> Project: Parquet
> Issue Type: New Feature
> Reporter: Jiashen Zhang
> Priority: Major
>
> h1. Background
> h2. What is Data Masking?
> Data masking is a technique used to protect sensitive data by replacing it
> with modified or obscured values. The purpose of data masking is to ensure
> that sensitive information, such as Personally Identifiable Information
> (PII), remains hidden from unauthorized users while allowing authorized users
> to perform their tasks.
> Here are a few key points about data masking:
> * Protection of Sensitive Data: Data masking helps to safeguard sensitive
> data, such as Social Security numbers, credit card numbers, names, addresses,
> and other personally identifiable information. By applying masking
> techniques, the original values are replaced with fictional or transformed
> data that retains the format and structure but removes any identifiable
> information.
> * Controlled Access: Data masking enables controlled access to sensitive
> data. Authorized users, typically with appropriate permissions, can access
> the unmasked or original data, while unauthorized users or users without the
> necessary permissions will only see the masked data.
> * Various Masking Techniques: There are different masking techniques
> available, depending on the specific data privacy requirements and use cases.
> Some commonly used techniques include:
> ** Nullification: Replacing original data with NULL values.
> ** Randomization: Replacing sensitive data with randomly generated values.
> ** Substitution: Replacing sensitive data with fictional but realistic
> values.
> ** Hashing: Transforming sensitive data into irreversible hashed values.
> ** Redaction: Removing or masking specific parts of sensitive data while
> retaining other non-sensitive information.
> * Compliance and Data Privacy: Data masking is often employed to comply with
> data protection regulations and maintain data privacy. By masking sensitive
> data, we can reduce the risk of data breaches and unauthorized access while
> still allowing legitimate users to perform their tasks.
> * Maintaining Data Consistency: Data masking techniques aim to maintain data
> consistency and integrity by ensuring that masked data retains the original
> data's format, structure, and relationships. This allows applications and
> processes that rely on the data to continue functioning correctly.
> h2. Why do we need it?
> Data masking serves several important purposes and provides numerous
> benefits. Here are some reasons why we need data masking:
> * Data Privacy and Compliance: Data masking helps us comply with data
> privacy regulations such as the General Data Protection Regulation (GDPR) and
> the Health Insurance Portability and Accountability Act (HIPAA). These
> regulations require us to protect sensitive data and ensure that it is only
> accessible to authorized individuals. Data masking enables us to comply with
> these regulations by de-identifying sensitive data.
> * Minimize Data Exposure: By masking sensitive data, we can reduce the risk
> of data breaches and unauthorized access. If a security breach occurs, the
> exposed data will be meaningless to unauthorized users due to the masking.
> This helps protect individuals' privacy and prevents misuse of sensitive
> information.
> * Secure Testing and Development Environments: Data masking is particularly
> useful in creating secure testing and development environments. By masking
> sensitive data, we can use realistic but fictional data for testing,
> analysis, and development activities without exposing real personal or
> sensitive information.
> * Enhanced Data Sharing: Data masking allows us to share data with external
> parties, such as partners or third-party vendors, while protecting sensitive
> information. Masked data can be shared with confidence, as the original
> sensitive values are replaced with transformed or fictional data.
> * Employee Privacy: Data masking helps protect employee privacy by
> obfuscating sensitive employee information, such as social security numbers
> or salary details, in databases or HR systems. This safeguards employees'
> personal data from unauthorized access or internal misuse.
> * Insider Threat Mitigation: Data masking reduces the risk posed by insider
> threats, where authorized individuals intentionally or accidentally misuse or
> expose sensitive data. By masking data, even individuals with access to the
> data will only see masked or fictional values, limiting the potential damage
> caused by internal security breaches.
> * Flexibility and Granularity: Data masking techniques offer flexibility and
> granularity in selecting the level of masking required for different types of
> data. We can determine the appropriate masking technique based on the
> sensitivity of the data and the specific use case.
> Overall, data masking is essential for protecting sensitive data, maintaining
> compliance with regulations, mitigating data breach risks, and enabling
> secure data sharing and testing environments. It plays a crucial role in
> ensuring data privacy and maintaining the trust of individuals whose data is
> being processed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)