ggershinsky commented on a change in pull request #8023:
URL: https://github.com/apache/arrow/pull/8023#discussion_r483447447



##########
File path: cpp/src/parquet/file_key_wrapper.h
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <map>
+#include <memory>
+#include <string>
+
+#include "parquet/file_key_material_store.h"
+#include "parquet/key_encryption_key.h"
+#include "parquet/kms_client.h"
+#include "parquet/kms_client_factory.h"
+
+namespace parquet {
+namespace encryption {
+
+class FileKeyWrapper {
+ public:
+  static constexpr int KEK_LENGTH = 16;
+  static constexpr int KEK_ID_LENGTH = 16;
+
+  static constexpr int RND_MAX_BYTES = 32;
+
+  FileKeyWrapper(std::shared_ptr<KmsClientFactory> kms_client_factory,
+                 const KmsConnectionConfig& kms_connection_config,
+                 std::shared_ptr<FileKeyMaterialStore> key_material_store,
+                 uint64_t cache_entry_lifetime, bool double_wrapping,
+                 bool is_wrap_locally);
+
+  std::string GetEncryptionKeyMetadata(const std::string& data_key,

Review comment:
       yep, I agree with you both, this is a bit confusing. Fortunately, this 
is not a public API, only an internal function; but still.. The "key metadata", 
returned here, refers to this term in the most basic sense - that is, a 
free-form byte array field, as defined in the encryption spec / thrift format. 
In the Java version, we simply return a "byte[]" here. On the other hand, the 
KeyMetadata class encapsulates a more detailed implementation of this term, 
that assumes the PKMT-specific structure, so it can not be used as the return 
parameter.
   Like @thamht4190 , I'd prefer the current approach, and agree with 
@emkornfield that it would be good to have a comment here (and in the next 
function). Something like 
   "Creates key_metadata field for a given data key, via wrapping the key with 
the master key"

##########
File path: cpp/src/parquet/file_key_wrapper.h
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <map>
+#include <memory>
+#include <string>
+
+#include "parquet/file_key_material_store.h"
+#include "parquet/key_encryption_key.h"
+#include "parquet/kms_client.h"
+#include "parquet/kms_client_factory.h"
+
+namespace parquet {
+namespace encryption {
+
+class FileKeyWrapper {
+ public:
+  static constexpr int KEK_LENGTH = 16;
+  static constexpr int KEK_ID_LENGTH = 16;
+
+  static constexpr int RND_MAX_BYTES = 32;
+
+  FileKeyWrapper(std::shared_ptr<KmsClientFactory> kms_client_factory,
+                 const KmsConnectionConfig& kms_connection_config,
+                 std::shared_ptr<FileKeyMaterialStore> key_material_store,
+                 uint64_t cache_entry_lifetime, bool double_wrapping,
+                 bool is_wrap_locally);
+
+  std::string GetEncryptionKeyMetadata(const std::string& data_key,
+                                       const std::string& master_key_id,
+                                       bool is_footer_key);
+
+  std::string GetEncryptionKeyMetadata(const std::string& data_key,

Review comment:
       a comment, something like
   "Creates key_metadata field for a given data key, via wrapping the key with 
the master key. This function signature is only called from the key rotation 
procedure, where the key_id_in_file is already available (retrieved from the 
key material being rotated), so the function does not have to generate these 
ids."

##########
File path: cpp/src/parquet/key_toolkit.h
##########
@@ -0,0 +1,123 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstdint>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "parquet/key_encryption_key.h"
+#include "parquet/kms_client.h"
+#include "parquet/kms_client_factory.h"
+#include "parquet/platform.h"
+#include "parquet/two_level_cache_with_expiration.h"
+
+namespace parquet {
+namespace encryption {
+
+class KeyWithMasterID {
+ public:
+  KeyWithMasterID(const std::string& key_bytes, const std::string& master_id)
+      : key_bytes_(key_bytes), master_id_(master_id) {}
+
+  const std::string& data_key() const { return key_bytes_; }
+  const std::string& master_id() const { return master_id_; }
+
+ private:
+  std::string key_bytes_;
+  std::string master_id_;
+};
+
+class PARQUET_EXPORT KeyToolkit {
+ public:
+  class KmsClientCache {
+   public:
+    static KmsClientCache& GetInstance() {
+      static KmsClientCache instance;

Review comment:
       There is some cost, associated with creating a KmsClient instance 
object, including a potential interaction with a remote KMS Server. Therefore, 
the cache is a singleton, so only one thread has to pay this cost, others will 
get the instance from the cache.

##########
File path: cpp/src/parquet/file_key_wrapper.h
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <map>
+#include <memory>
+#include <string>
+
+#include "parquet/file_key_material_store.h"
+#include "parquet/key_encryption_key.h"
+#include "parquet/kms_client.h"
+#include "parquet/kms_client_factory.h"
+
+namespace parquet {
+namespace encryption {
+
+class FileKeyWrapper {
+ public:
+  static constexpr int KEK_LENGTH = 16;
+  static constexpr int KEK_ID_LENGTH = 16;
+
+  static constexpr int RND_MAX_BYTES = 32;
+
+  FileKeyWrapper(std::shared_ptr<KmsClientFactory> kms_client_factory,
+                 const KmsConnectionConfig& kms_connection_config,
+                 std::shared_ptr<FileKeyMaterialStore> key_material_store,
+                 uint64_t cache_entry_lifetime, bool double_wrapping,
+                 bool is_wrap_locally);
+
+  std::string GetEncryptionKeyMetadata(const std::string& data_key,
+                                       const std::string& master_key_id,
+                                       bool is_footer_key);
+
+  std::string GetEncryptionKeyMetadata(const std::string& data_key,
+                                       const std::string& master_key_id,
+                                       bool is_footer_key, std::string 
key_id_in_file);
+
+ private:
+  KeyEncryptionKey CreateKeyEncryptionKey(const std::string& master_key_id);
+
+  // A map of MEK_ID -> KeyEncryptionKey, for the current token
+  std::map<std::string, KeyEncryptionKey> kek_per_master_key_id_;
+
+  std::shared_ptr<KmsClient> kms_client_;
+  KmsConnectionConfig kms_connection_config_;
+  std::shared_ptr<FileKeyMaterialStore> key_material_store_;
+  uint64_t cache_entry_lifetime_;
+  bool double_wrapping_;
+  uint16_t key_counter_;

Review comment:
       the maximal value of this counter is the number of columns in a file (if 
every one is encrypted with a different key), so int16 is reasonable. Moreover, 
there is a deeper limit on this, described under 
https://github.com/apache/parquet-format/pull/114#discussion_r232941138 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to