ggershinsky commented on a change in pull request #8023: URL: https://github.com/apache/arrow/pull/8023#discussion_r485535843
########## File path: cpp/src/parquet/key_toolkit.h ########## @@ -0,0 +1,123 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include <cstdint> +#include <memory> +#include <string> +#include <vector> + +#include "parquet/key_encryption_key.h" +#include "parquet/kms_client.h" +#include "parquet/kms_client_factory.h" +#include "parquet/platform.h" +#include "parquet/two_level_cache_with_expiration.h" + +namespace parquet { +namespace encryption { + +class KeyWithMasterID { + public: + KeyWithMasterID(const std::string& key_bytes, const std::string& master_id) + : key_bytes_(key_bytes), master_id_(master_id) {} + + const std::string& data_key() const { return key_bytes_; } + const std::string& master_id() const { return master_id_; } + + private: + std::string key_bytes_; + std::string master_id_; +}; + +class PARQUET_EXPORT KeyToolkit { + public: + class KmsClientCache { + public: + static KmsClientCache& GetInstance() { + static KmsClientCache instance; Review comment: The cache entries are stored/retrieved by the access tokens, which enables multi-tenancy. Each caller is able to access only the objects cached with its token, using the token as the cache key. [btw, the validity of the token is verified vs the KMS, but this doesn't relate directly to the current discussion]. Also, the technical challenges of using a singleton can be surmounted with a reasonable effort. So I think that the current design is sound. However, I can understand the desire to use an alternative to singletons where possible; and it indeed should be possible in the case of this particular encryption interface. I agree it's possible to provide users with an explicit API to create cache instances, and pass them to parquet. But such API would be difficult to explain, and would likely lead to a situation where many users will run without caches, meaning without KMS RPC optimization (each thread will interact with a KMS server, for every key). So let me propose the following. This encryption interface has a class called PropertiesDrivenCryptoFactory. It's an anchor; one factory instance would typically be created by a user / tenant, and used to create crypto objects for different files / threads. Therefore, we can make the cache a regular, non-singleton, member in this class. It will be shared then across the threads, and will be limited to this tenant. Naturally, this will need to be well documented for the PropertiesDrivenCryptoFactory class (already a part of the public API of encryption). What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org