[ 
https://issues.apache.org/jira/browse/PARQUET-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125902#comment-17125902
 ] 

ASF GitHub Bot commented on PARQUET-1807:
-----------------------------------------

andersonm-ibm commented on a change in pull request #782:
URL: https://github.com/apache/parquet-mr/pull/782#discussion_r435237663



##########
File path: 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java
##########
@@ -0,0 +1,660 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.crypto.ColumnDecryptionProperties;
+import org.apache.parquet.crypto.ColumnEncryptionProperties;
+import org.apache.parquet.crypto.FileDecryptionProperties;
+import org.apache.parquet.crypto.FileEncryptionProperties;
+import org.apache.parquet.crypto.ParquetCipher;
+import org.apache.parquet.crypto.DecryptionKeyRetriever;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+import org.apache.parquet.filter2.compat.FilterCompat;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.example.GroupReadSupport;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnPath;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Types;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ErrorCollector;
+import org.junit.rules.TemporaryFolder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.parquet.hadoop.ParquetFileWriter.Mode.OVERWRITE;
+import static 
org.apache.parquet.hadoop.metadata.CompressionCodecName.UNCOMPRESSED;
+import static org.apache.parquet.schema.MessageTypeParser.parseMessageType;
+import static 
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.BOOLEAN;
+import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.INT32;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+/*
+ * This file contains samples for writing and reading encrypted Parquet files 
in different
+ * encryption and decryption configurations. The samples have the following 
goals:
+ * 1) Demonstrate usage of different options for data encryption and 
decryption.
+ * 2) Produce encrypted files for interoperability tests with other (eg 
parquet-cpp)
+ *    readers that support encryption.
+ * 3) Produce encrypted files with plaintext footer, for testing the ability 
of legacy
+ *    readers to parse the footer and read unencrypted columns.
+ * 4) Perform interoperability tests with other (eg parquet-cpp) writers, by 
reading
+ *    encrypted files produced by these writers.
+ *
+ * The write sample produces number of parquet files, each encrypted with a 
different
+ * encryption configuration as described below.
+ * The name of each file is in the form of:
+ * tester<encryption config number>.parquet.encrypted.
+ *
+ * The read sample creates a set of decryption configurations and then uses 
each of them
+ * to read all encrypted files in the input directory.
+ *
+ * The different encryption and decryption configurations are listed below.
+ *
+ *
+ * A detailed description of the Parquet Modular Encryption specification can 
be found
+ * here:
+ * https://github.com/apache/parquet-format/blob/encryption/Encryption.md
+ *
+ * The write sample creates files with eight columns in the following
+ * encryption configurations:
+ *
+ *  UNIFORM_ENCRYPTION:             Encrypt all columns and the footer with 
the same key.
+ *                                  (uniform encryption)
+ *  ENCRYPT_COLUMNS_AND_FOOTER:     Encrypt two columns and the footer, with 
different
+ *                                  keys.
+ *  ENCRYPT_COLUMNS_PLAINTEXT_FOOTER: Encrypt two columns, with different keys.
+ *                                  Do not encrypt footer (to enable legacy 
readers)
+ *                                  - plaintext footer mode.
+ *  ENCRYPT_COLUMNS_AND_FOOTER_AAD: Encrypt two columns and the footer, with 
different
+ *                                  keys. Supply aad_prefix for file identity
+ *                                  verification.
+ *  ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE:   Encrypt two columns and 
the footer,
+ *                                  with different keys. Supply aad_prefix, 
and call
+ *                                  disable_aad_prefix_storage to prevent file
+ *                                  identity storage in file metadata.
+ *  ENCRYPT_COLUMNS_AND_FOOTER_CTR: Encrypt two columns and the footer, with 
different
+ *                                  keys. Use the alternative (AES_GCM_CTR_V1) 
algorithm.
+ *  NO_ENCRYPTION:                  Do not encrypt anything
+ *
+ *
+ * The read sample uses each of the following decryption configurations to 
read every
+ * encrypted files in the input directory:
+ *
+ *  DECRYPT_WITH_KEY_RETRIEVER:     Decrypt using key retriever that holds the 
keys of
+ *                                  two encrypted columns and the footer key.
+ *  DECRYPT_WITH_KEY_RETRIEVER_AAD: Decrypt using key retriever that holds the 
keys of
+ *                                  two encrypted columns and the footer key. 
Supplies
+ *                                  aad_prefix to verify file identity.
+ *  DECRYPT_WITH_EXPLICIT_KEYS:     Decrypt using explicit column and footer 
keys
+ *                                  (instead of key retrieval callback).
+ *  NO_DECRYPTION:                  Do not decrypt anything.
+ */
+public class TestEncryptionOptions {
+  private static final Logger LOG = 
LoggerFactory.getLogger(TestEncryptionOptions.class);
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  @Rule
+  public ErrorCollector errorCollector = new ErrorCollector();
+
+  private static final byte[] FOOTER_ENCRYPTION_KEY = 
"0123456789012345".getBytes();
+  private static final byte[] COLUMN_ENCRYPTION_KEY1 = 
"1234567890123450".getBytes();
+  private static final byte[] COLUMN_ENCRYPTION_KEY2 = 
"1234567890123451".getBytes();
+  private static final String FOOTER_ENCRYPTION_KEY_ID = "kf";
+  private static final String COLUMN_ENCRYPTION_KEY1_ID = "kc1";
+  private static final String COLUMN_ENCRYPTION_KEY2_ID = "kc2";
+  private static final String AAD_PREFIX_STRING = "tester";
+  private static final String BOOLEAN_FIELD_NAME = "boolean_field";
+  private static final String INT32_FIELD_NAME = "int32_field";
+  private static final String FLOAT_FIELD_NAME = "float_field";
+  private static final String DOUBLE_FIELD_NAME = "double_field";
+
+  public enum EncryptionConfiguration {
+    UNIFORM_ENCRYPTION("UNIFORM_ENCRYPTION"),
+    ENCRYPT_COLUMNS_AND_FOOTER("ENCRYPT_COLUMNS_AND_FOOTER"),
+    ENCRYPT_COLUMNS_PLAINTEXT_FOOTER("ENCRYPT_COLUMNS_PLAINTEXT_FOOTER"),
+    ENCRYPT_COLUMNS_AND_FOOTER_AAD("ENCRYPT_COLUMNS_AND_FOOTER_AAD"),
+    
ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE("ENCRYPT_COLUMNS_AND_FOOTER_DISABLE_AAD_STORAGE"),
+    ENCRYPT_COLUMNS_AND_FOOTER_CTR("ENCRYPT_COLUMNS_AND_FOOTER_CTR"),
+    NO_ENCRYPTION("NO_ENCRYPTION");
+
+    private final String configurationName;
+
+    EncryptionConfiguration(String configurationName) {
+      this.configurationName = configurationName;
+    }
+
+    @Override
+    public String toString() {
+      return configurationName;
+    }
+  }
+
+
+  public enum DecryptionConfiguration {
+    DECRYPT_WITH_KEY_RETRIEVER("DECRYPT_WITH_KEY_RETRIEVER"),
+    DECRYPT_WITH_KEY_RETRIEVER_AAD("DECRYPT_WITH_KEY_RETRIEVER_AAD"),
+    DECRYPT_WITH_EXPLICIT_KEYS("DECRYPT_WITH_EXPLICIT_KEYS"),
+    NO_DECRYPTION("NO_DECRYPTION");
+
+    private final String configurationName;
+
+    DecryptionConfiguration(String configurationName) {
+      this.configurationName = configurationName;
+    }
+
+    @Override
+    public String toString() {
+      return configurationName;
+    }
+  }
+
+  @Test
+  public void testWriteReadEncryptedParquetFiles() throws IOException {
+    Path rootPath = new Path(temporaryFolder.getRoot().getPath());
+    LOG.info("======== testWriteReadEncryptedParquetFiles {} ========", 
rootPath.toString());
+    byte[] AADPrefix = AAD_PREFIX_STRING.getBytes(StandardCharsets.UTF_8);
+    // This array will hold various encryption configuraions.
+    Map<EncryptionConfiguration, FileEncryptionProperties> 
encryptionPropertiesMap =
+      getEncryptionConfigurations(AADPrefix);
+    testWriteEncryptedParquetFiles(rootPath, encryptionPropertiesMap);
+    // This array will hold various decryption configurations.
+    Map<DecryptionConfiguration, FileDecryptionProperties> 
decryptionPropertiesMap =
+      getDecryptionConfigurations(AADPrefix);
+    testReadEncryptedParquetFiles(rootPath, decryptionPropertiesMap);
+  }
+
+  @Test
+  public void testInteropReadEncryptedParquetFiles() throws IOException {
+    Path rootPath = new Path("submodules/parquet-testing/data");
+    LOG.info("======== testInteropReadEncryptedParquetFiles {} ========", 
rootPath.toString());
+    byte[] AADPrefix = AAD_PREFIX_STRING.getBytes(StandardCharsets.UTF_8);
+    // This array will hold various decryption configurations.
+    Map<DecryptionConfiguration, FileDecryptionProperties> 
decryptionPropertiesMap =
+      getDecryptionConfigurations(AADPrefix);
+    testReadEncryptedParquetFiles(rootPath, decryptionPropertiesMap);
+  }
+
+  private void testWriteEncryptedParquetFiles(Path root, 
Map<EncryptionConfiguration, FileEncryptionProperties> encryptionPropertiesMap) 
throws IOException {
+    Configuration conf = new Configuration();
+    int numberOfEncryptionModes = encryptionPropertiesMap.size();
+
+    MessageType schema = parseMessageType(
+      "message test { "
+        + "required boolean " + BOOLEAN_FIELD_NAME + "; "
+        + "required int32 " + INT32_FIELD_NAME + "; "
+        + "required float " + FLOAT_FIELD_NAME + "; "
+        + "required double " + DOUBLE_FIELD_NAME + "; "
+        + "} ");
+
+    GroupWriteSupport.setSchema(schema, conf);
+    SimpleGroupFactory f = new SimpleGroupFactory(schema);
+
+
+    for (Map.Entry<EncryptionConfiguration, FileEncryptionProperties> 
encryptionConfigurationEntry : encryptionPropertiesMap.entrySet()) {
+      EncryptionConfiguration encryptionConfiguration = 
encryptionConfigurationEntry.getKey();
+      Path file = new Path(root, encryptionConfiguration.toString() + 
".parquet.encrypted");
+
+      LOG.info("\nWrite " + file.toString());
+      ParquetWriter<Group> writer = ExampleParquetWriter.builder(file)
+        .withWriteMode(OVERWRITE)
+        .withType(schema)
+        .withEncryption(encryptionConfigurationEntry.getValue())
+        .build();
+
+      for (int i = 0; i < 100; i++) {
+        boolean expect = false;
+        if ((i % 2) == 0)
+          expect = true;
+        float float_val = (float) i * 1.1f;
+        double double_val = (i * 1.1111111);
+
+        writer.write(
+          f.newGroup()
+            .append(BOOLEAN_FIELD_NAME, expect)
+            .append(INT32_FIELD_NAME, i)
+            .append(FLOAT_FIELD_NAME, float_val)
+            .append(DOUBLE_FIELD_NAME, double_val));
+
+      }
+      writer.close();
+    }
+  }
+
+  private void testReadEncryptedParquetFiles(Path root, 
Map<DecryptionConfiguration, FileDecryptionProperties> decryptionPropertiesMap) 
throws IOException {
+    Configuration conf = new Configuration();
+
+    for (Map.Entry<DecryptionConfiguration, FileDecryptionProperties> 
decryptionConfigurationEntry : decryptionPropertiesMap.entrySet()) {
+      DecryptionConfiguration decryptionConfiguration = 
decryptionConfigurationEntry.getKey();
+      LOG.info("==> Decryption configuration {}", decryptionConfiguration);
+      FileDecryptionProperties fileDecryptionProperties = 
decryptionConfigurationEntry.getValue();
+
+      File folder = new File(root.toString());
+      File[] listOfFiles = folder.listFiles();
+
+      for (int fileNum = 0; fileNum < listOfFiles.length; fileNum++) {
+        Path file = new Path(listOfFiles[fileNum].getAbsolutePath());
+        if (!file.getName().endsWith("parquet.encrypted")) { // Skip non 
encrypted files
+          continue;
+        }
+        EncryptionConfiguration encryptionConfiguration = 
getEncryptionConfigurationFromFilename(file.getName());
+        if (null == encryptionConfiguration) {
+          continue;
+        }
+        LOG.info("--> Read file {} {}", file.toString(), 
encryptionConfiguration);
+
+        // Read only the non-encrypted columns
+        if ((decryptionConfiguration == DecryptionConfiguration.NO_DECRYPTION) 
&&
+          (encryptionConfiguration == 
EncryptionConfiguration.ENCRYPT_COLUMNS_PLAINTEXT_FOOTER)) {
+          conf.set("parquet.read.schema", Types.buildMessage()
+            .required(BOOLEAN).named(BOOLEAN_FIELD_NAME)
+            .required(INT32).named(INT32_FIELD_NAME)
+            .named("FormatTestObject").toString());
+        }
+
+        ParquetReader<Group> reader = ParquetReader.builder(new 
GroupReadSupport(), file)
+          .withConf(conf)
+          .withDecryption(fileDecryptionProperties)
+          .build();
+
+        try {
+          for (int i = 0; i < 500; i++) {

Review comment:
       Thanks, Gabor, good catch!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Encryption: Interop and Function test suite for Java version
> ------------------------------------------------------------
>
>                 Key: PARQUET-1807
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1807
>             Project: Parquet
>          Issue Type: Sub-task
>          Components: parquet-mr
>            Reporter: Gidon Gershinsky
>            Assignee: Maya Anderson
>            Priority: Major
>              Labels: pull-request-available
>
> # Interop: test parquet-mr reading of encrypted files produced by parquet-cpp 
> (fetched from parquet-testing)
>  # Function: test writing/reading in a number of encryption and decryption 
> configurations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to