wgtmac commented on code in PR #39623:
URL: https://github.com/apache/arrow/pull/39623#discussion_r1454312200
##########
cpp/src/parquet/encryption/encryption_internal.cc:
##########
@@ -24,6 +24,7 @@
#include <algorithm>
#include <iostream>
#include <memory>
+#include <mutex>
Review Comment:
This is unnecessary.
##########
cpp/src/parquet/encryption/internal_file_decryptor.h:
##########
@@ -88,11 +89,10 @@ class InternalFileDecryptor {
const std::string& aad = "");
private:
+ std::mutex mutex_;
FileDecryptionProperties* properties_;
// Concatenation of aad_prefix (if exists) and aad_file_unique
std::string file_aad_;
- std::map<std::string, std::shared_ptr<Decryptor>> column_data_map_;
- std::map<std::string, std::shared_ptr<Decryptor>> column_metadata_map_;
std::shared_ptr<Decryptor> footer_metadata_decryptor_;
Review Comment:
The Decryptor implementation is not thread-safe, but fortunately
`footer_metadata_decryptor_` and `footer_data_decryptor_` are never accessed
concurrently.
##########
cpp/src/parquet/encryption/internal_file_decryptor.h:
##########
@@ -88,11 +89,10 @@ class InternalFileDecryptor {
const std::string& aad = "");
private:
+ std::mutex mutex_;
Review Comment:
Is it used to guard `all_decryptors_` at line 103? Perhaps it would be much
clearer if we move it close to all_decryptors_ and add a comment
`GUARDED_BY(mutex_)` to all_decryptors_
##########
cpp/src/parquet/encryption/internal_file_decryptor.cc:
##########
@@ -138,6 +139,9 @@ std::shared_ptr<Decryptor>
InternalFileDecryptor::GetFooterDecryptor(
// Create both data and metadata decryptors to avoid redundant retrieval of
key
// from the key_retriever.
+
+ std::lock_guard<std::mutex> lock(mutex_); // Lock the critical section
Review Comment:
Should we add a block to limit the scope of the lock from line 146 to 149
only?
##########
cpp/src/arrow/dataset/large_row_parquet_encrypt_test.cc:
##########
@@ -0,0 +1,201 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <arrow/api.h>
+#include <arrow/dataset/dataset.h>
+#include <arrow/dataset/file_base.h>
+#include <arrow/dataset/file_parquet.h>
+#include <arrow/dataset/parquet_encryption_config.h>
+#include <arrow/filesystem/localfs.h> // Include the header for
LocalFileSystem
+#include <arrow/filesystem/mockfs.h>
+#include <arrow/io/api.h>
+#include <arrow/status.h>
+#include <arrow/table.h>
+#include <arrow/testing/gtest_util.h>
+#include <arrow/testing/random.h>
+#include <parquet/arrow/reader.h>
+#include <parquet/encryption/crypto_factory.h>
+#include <parquet/encryption/encryption.h>
+#include <parquet/encryption/encryption_internal.h>
+#include <parquet/encryption/kms_client.h>
+#include <parquet/encryption/test_in_memory_kms.h>
+#include <cmath>
+#include <memory>
+#include <string_view>
+#include "gtest/gtest.h"
+
+#include <arrow/util/base64.h>
+#include <filesystem>
+
+#define ROW_COUNT 32769;
Review Comment:
Please use `constexpr int kRowCount`
##########
cpp/src/parquet/encryption/internal_file_decryptor.cc:
##########
@@ -200,21 +190,13 @@ std::shared_ptr<Decryptor>
InternalFileDecryptor::GetColumnDecryptor(
throw HiddenColumnException("HiddenColumnException, path=" + column_path);
}
- // Create both data and metadata decryptors to avoid redundant retrieval of
key
- // using the key_retriever.
int key_len = static_cast<int>(column_key.size());
- auto aes_metadata_decryptor = encryption::AesDecryptor::Make(
- algorithm_, key_len, /*metadata=*/true, &all_decryptors_);
- auto aes_data_decryptor = encryption::AesDecryptor::Make(
- algorithm_, key_len, /*metadata=*/false, &all_decryptors_);
- column_metadata_map_[column_path] = std::make_shared<Decryptor>(
- aes_metadata_decryptor, column_key, file_aad_, aad, pool_);
- column_data_map_[column_path] =
- std::make_shared<Decryptor>(aes_data_decryptor, column_key, file_aad_,
aad, pool_);
+ std::lock_guard<std::mutex> lock(mutex_);
+ auto aes_decryptor =
+ encryption::AesDecryptor::Make(algorithm_, key_len, metadata,
&all_decryptors_);
- if (metadata) return column_metadata_map_[column_path];
- return column_data_map_[column_path];
+ return std::make_shared<Decryptor>(aes_decryptor, column_key, file_aad_,
aad, pool_);
Review Comment:
```suggestion
return std::make_shared<Decryptor>(std::move(aes_decryptor), column_key,
file_aad_, aad, pool_);
```
##########
cpp/src/parquet/encryption/internal_file_decryptor.cc:
##########
@@ -61,6 +61,7 @@
InternalFileDecryptor::InternalFileDecryptor(FileDecryptionProperties* propertie
}
void InternalFileDecryptor::WipeOutDecryptionKeys() {
+ std::lock_guard<std::mutex> lock(mutex_);
Review Comment:
CMIW, this is only called once in `SerializedFile::Close()` so it does not
require the lock here.
##########
cpp/src/parquet/encryption/internal_file_decryptor.cc:
##########
@@ -168,21 +172,7 @@ std::shared_ptr<Decryptor>
InternalFileDecryptor::GetColumnDataDecryptor(
std::shared_ptr<Decryptor> InternalFileDecryptor::GetColumnDecryptor(
const std::string& column_path, const std::string& column_key_metadata,
const std::string& aad, bool metadata) {
- std::string column_key;
Review Comment:
This may introduce performance regression. But that's not my concern here
due to keeping the cache in single-threaded mode may require more complicated
logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]