pitrou commented on code in PR #44990: URL: https://github.com/apache/arrow/pull/44990#discussion_r1944980454
########## cpp/src/parquet/encryption/encryption_internal.cc: ########## @@ -52,25 +52,55 @@ constexpr int32_t kBufferSizeLength = 4; throw ParquetException("Couldn't init ALG decryption"); \ } -class AesEncryptor::AesEncryptorImpl { +class AesEncryptionContext { + public: + AesEncryptionContext(ParquetCipher::type alg_id, int32_t key_len, bool metadata, + bool write_length) { + openssl::EnsureInitialized(); + + length_buffer_length_ = write_length ? kBufferSizeLength : 0; + ciphertext_size_delta_ = length_buffer_length_ + kNonceLength; + if (metadata || (ParquetCipher::AES_GCM_V1 == alg_id)) { + aes_mode_ = kGcmMode; + ciphertext_size_delta_ += kGcmTagLength; + } else { + aes_mode_ = kCtrMode; + } + + if (16 != key_len && 24 != key_len && 32 != key_len) { + std::stringstream ss; + ss << "Wrong key length: " << key_len; + throw ParquetException(ss.str()); + } + + key_length_ = key_len; + } + + virtual ~AesEncryptionContext() = default; + + protected: + static inline std::function<void(EVP_CIPHER_CTX*)> ctx_deleter_ = + [](EVP_CIPHER_CTX* ctx) { EVP_CIPHER_CTX_free(ctx); }; Review Comment: That's wrapping a plain function pointer in a heavy `std::function` object, which seems superfluous here. How about something like: ```suggestion static void DeleteCipherContext(EVP_CIPHER_CTX* ctx) { EVP_CIPHER_CTX_free(ctx); } using CipherContext = std::unique_ptr<EVP_CIPHER_CTX, decltype(DeleteCipherContext)>; static CipherContext WrapCipherContext(EVP_CIPHER_CTX* ctx) { return CipherContext(ctx, DeleteCipherContext); } ``` ########## cpp/src/arrow/dataset/file_parquet_encryption_test.cc: ########## @@ -51,8 +53,15 @@ using arrow::internal::checked_pointer_cast; namespace arrow { namespace dataset { +// Tests come in these variations +enum EncryptionParam { + COLUMN_KEY, + UNIFORM, +}; + // Base class to test writing and reading encrypted dataset. -class DatasetEncryptionTestBase : public ::testing::Test { +class DatasetEncryptionTestBase + : public testing::TestWithParam<std::tuple<EncryptionParam, bool>> { Review Comment: I think this would be more readable with something like: ```c++ struct EncryptionTestParam { bool uniform_encryption; // false is using per-column keys bool parallel; }; class DatasetEncryptionTestBase : public testing::TestWithParam<EncryptionTestParam> { ``` ########## cpp/src/parquet/column_reader.cc: ########## @@ -333,14 +340,16 @@ void SerializedPageReader::InitDecryption() { // Prepare the AAD for quick update later. if (crypto_ctx_.data_decryptor != nullptr) { ARROW_DCHECK(!crypto_ctx_.data_decryptor->file_aad().empty()); + data_decryptor_ = std::make_shared<Decryptor>(*crypto_ctx_.data_decryptor); Review Comment: I think this means that we want to change: ```c++ std::shared_ptr<Decryptor> GetColumnDataDecryptor( const ColumnCryptoMetaData* crypto_metadata, InternalFileDecryptor* file_decryptor); ``` into: ```c++ std::function<std::shared_ptr<Decryptor>()> GetColumnDataDecryptor( const ColumnCryptoMetaData* crypto_metadata, InternalFileDecryptor* file_decryptor); ``` so that we can get a new Decryptor instance every time we need one. (same for the meta decryptor, of course) ########## cpp/src/parquet/column_reader.cc: ########## @@ -333,14 +340,16 @@ void SerializedPageReader::InitDecryption() { // Prepare the AAD for quick update later. if (crypto_ctx_.data_decryptor != nullptr) { ARROW_DCHECK(!crypto_ctx_.data_decryptor->file_aad().empty()); + data_decryptor_ = std::make_shared<Decryptor>(*crypto_ctx_.data_decryptor); Review Comment: So this is making a copy of the `Decryptor` object. But `Decryptor::aes_decryptor_` is a `std::shared_ptr<encryption::AesDecryptor>`, so this is merely going to increment the shared_ptr's reference count, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org