ggershinsky commented on a change in pull request #8023:
URL: https://github.com/apache/arrow/pull/8023#discussion_r485535843



##########
File path: cpp/src/parquet/key_toolkit.h
##########
@@ -0,0 +1,123 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstdint>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "parquet/key_encryption_key.h"
+#include "parquet/kms_client.h"
+#include "parquet/kms_client_factory.h"
+#include "parquet/platform.h"
+#include "parquet/two_level_cache_with_expiration.h"
+
+namespace parquet {
+namespace encryption {
+
+class KeyWithMasterID {
+ public:
+  KeyWithMasterID(const std::string& key_bytes, const std::string& master_id)
+      : key_bytes_(key_bytes), master_id_(master_id) {}
+
+  const std::string& data_key() const { return key_bytes_; }
+  const std::string& master_id() const { return master_id_; }
+
+ private:
+  std::string key_bytes_;
+  std::string master_id_;
+};
+
+class PARQUET_EXPORT KeyToolkit {
+ public:
+  class KmsClientCache {
+   public:
+    static KmsClientCache& GetInstance() {
+      static KmsClientCache instance;

Review comment:
       The cache entries are stored/retrieved by the access tokens, which 
enables multi-tenancy. Each caller is able to access only the objects cached 
with its token, using the token as the cache key. [btw, the validity of the 
token is verified vs the KMS, but this doesn't relate directly to the current 
discussion].  Also, the technical challenges of using a singleton can be 
surmounted with a reasonable effort. So I think that the current design is 
sound.
   
   However, I can understand the desire to use an alternative to singletons 
where possible; and it indeed should be possible in the case of this particular 
encryption interface.
   I agree it's possible to provide users with an explicit API to create cache 
instances, and pass them to parquet. But such API would be difficult to 
explain, and would likely lead to a situation where many users will run without 
caches, meaning without KMS RPC optimization (each thread will interact with a 
KMS server, for every key).
   So let me propose the following. This encryption interface has a class 
called PropertiesDrivenCryptoFactory. It's an anchor; one factory instance 
would typically be created by a user / tenant, and used to create crypto 
objects for different files / threads. Therefore, we can make the cache a 
regular, non-singleton, member in this class. It will be shared then across the 
threads, and will be limited to this tenant. Naturally, this will need to be 
well documented for the  PropertiesDrivenCryptoFactory class (already a part of 
the public API of encryption).
   What do you think?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to