[ 
https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738130#comment-17738130
 ] 

ASF GitHub Bot commented on PARQUET-2223:
-----------------------------------------

shangxinli commented on code in PR #1112:
URL: https://github.com/apache/parquet-mr/pull/1112#discussion_r1245235188


##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/HiddenColumnException.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto;
+
+import java.util.Arrays;
+
+import org.apache.parquet.ParquetRuntimeException;
+
+/**
+ * Reader doesn't have key for encrypted column,
+ * but tries to access its contents
+ *
+ */
+public class HiddenColumnException extends ParquetRuntimeException {
+  private static final long serialVersionUID = 1L;
+  private static final String configHelpText = " If returning 'null' for 
encrypted columns is acceptable," +
+    " please add 'set parquet.crypto.read.masked.value.enabled=true;' before 
your query" +

Review Comment:
   You can reword it like the user doesn't have permission. If return null for 
this non-permitted column is acceptable,  set config 
'parquet.crypto.read.masked.value.enabled' to 'true'





> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
>                 Key: PARQUET-2223
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2223
>             Project: Parquet
>          Issue Type: New Feature
>            Reporter: Jiashen Zhang
>            Priority: Major
>
> h1. Background
> h2. What is Data Masking?
> Data masking is a technique used to protect sensitive data by replacing it 
> with modified or obscured values. The purpose of data masking is to ensure 
> that sensitive information, such as Personally Identifiable Information 
> (PII), remains hidden from unauthorized users while allowing authorized users 
> to perform their tasks.
> Here are a few key points about data masking:
>  * Protection of Sensitive Data: Data masking helps to safeguard sensitive 
> data, such as Social Security numbers, credit card numbers, names, addresses, 
> and other personally identifiable information. By applying masking 
> techniques, the original values are replaced with fictional or transformed 
> data that retains the format and structure but removes any identifiable 
> information.
>  * Controlled Access: Data masking enables controlled access to sensitive 
> data. Authorized users, typically with appropriate permissions, can access 
> the unmasked or original data, while unauthorized users or users without the 
> necessary permissions will only see the masked data.
>  * Various Masking Techniques: There are different masking techniques 
> available, depending on the specific data privacy requirements and use cases. 
> Some commonly used techniques include:
>  ** Nullification: Replacing original data with NULL values.
>  ** Randomization: Replacing sensitive data with randomly generated values.
>  ** Substitution: Replacing sensitive data with fictional but realistic 
> values.
>  ** Hashing: Transforming sensitive data into irreversible hashed values.
>  ** Redaction: Removing or masking specific parts of sensitive data while 
> retaining other non-sensitive information.
>  * Compliance and Data Privacy: Data masking is often employed to comply with 
> data protection regulations and maintain data privacy. By masking sensitive 
> data, we can reduce the risk of data breaches and unauthorized access while 
> still allowing legitimate users to perform their tasks.
>  * Maintaining Data Consistency: Data masking techniques aim to maintain data 
> consistency and integrity by ensuring that masked data retains the original 
> data's format, structure, and relationships. This allows applications and 
> processes that rely on the data to continue functioning correctly.
> h2. Why do we need it?
> Data masking serves several important purposes and provides numerous 
> benefits. Here are some reasons why we need data masking:
>  * Data Privacy and Compliance: Data masking helps us comply with data 
> privacy regulations such as the General Data Protection Regulation (GDPR) and 
> the Health Insurance Portability and Accountability Act (HIPAA). These 
> regulations require us to protect sensitive data and ensure that it is only 
> accessible to authorized individuals. Data masking enables us to comply with 
> these regulations by de-identifying sensitive data.
>  * Minimize Data Exposure: By masking sensitive data, we can reduce the risk 
> of data breaches and unauthorized access. If a security breach occurs, the 
> exposed data will be meaningless to unauthorized users due to the masking. 
> This helps protect individuals' privacy and prevents misuse of sensitive 
> information.
>  * Secure Testing and Development Environments: Data masking is particularly 
> useful in creating secure testing and development environments. By masking 
> sensitive data, we can use realistic but fictional data for testing, 
> analysis, and development activities without exposing real personal or 
> sensitive information.
>  * Enhanced Data Sharing: Data masking allows us to share data with external 
> parties, such as partners or third-party vendors, while protecting sensitive 
> information. Masked data can be shared with confidence, as the original 
> sensitive values are replaced with transformed or fictional data.
>  * Employee Privacy: Data masking helps protect employee privacy by 
> obfuscating sensitive employee information, such as social security numbers 
> or salary details, in databases or HR systems. This safeguards employees' 
> personal data from unauthorized access or internal misuse.
>  * Insider Threat Mitigation: Data masking reduces the risk posed by insider 
> threats, where authorized individuals intentionally or accidentally misuse or 
> expose sensitive data. By masking data, even individuals with access to the 
> data will only see masked or fictional values, limiting the potential damage 
> caused by internal security breaches.
>  * Flexibility and Granularity: Data masking techniques offer flexibility and 
> granularity in selecting the level of masking required for different types of 
> data. We can determine the appropriate masking technique based on the 
> sensitivity of the data and the specific use case.
> Overall, data masking is essential for protecting sensitive data, maintaining 
> compliance with regulations, mitigating data breach risks, and enabling 
> secure data sharing and testing environments. It plays a crucial role in 
> ensuring data privacy and maintaining the trust of individuals whose data is 
> being processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to