wjones127 commented on code in PR #34616: URL: https://github.com/apache/arrow/pull/34616#discussion_r1189022551
########## cpp/src/arrow/dataset/CMakeLists.txt: ########## @@ -96,6 +96,13 @@ if(ARROW_BUILD_STATIC AND WIN32) target_compile_definitions(arrow_dataset_static PUBLIC ARROW_DS_STATIC) endif() +if(ARROW_PARQUET + AND PARQUET_REQUIRE_ENCRYPTION + AND ARROW_DATASET) + # Add parquet to ARROW_TEST_STATIC_LINK_LIBS + list(APPEND ARROW_TEST_STATIC_LINK_LIBS parquet) +endif() + Review Comment: This should be already taken care of in the lines above: https://github.com/apache/arrow/blob/3fd881cac4476f1f7e6a531fe2864b1820de62ad/cpp/src/arrow/dataset/CMakeLists.txt#L56-L63 ```suggestion ``` ########## cpp/src/arrow/dataset/file_parquet.cc: ########## @@ -16,14 +16,12 @@ // under the License. #include "arrow/dataset/file_parquet.h" - Review Comment: Please don't remove these new lines. ########## cpp/src/arrow/dataset/dataset_encryption_test.cc: ########## @@ -0,0 +1,386 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include <string_view> +#include "arrow/api.h" +#include "arrow/array/builder_primitive.h" +#include "arrow/builder.h" +#include "arrow/dataset/api.h" +#include "arrow/dataset/parquet_encryption_config.h" +#include "arrow/dataset/partition.h" +#include "arrow/filesystem/mockfs.h" +#include "arrow/io/api.h" +#include "arrow/status.h" +#include "arrow/table.h" +#include "arrow/testing/gtest_util.h" +#include "gtest/gtest.h" +#include "parquet/arrow/reader.h" +#include "parquet/encryption/test_in_memory_kms.h" Review Comment: nit: I think we like to keep the includes sectioned with the corresponding header first (not in this file), then std library includes, then third-party includes, then internal includes. You can see this structure across our other files. ```suggestion #include <string_view> #include "gtest/gtest.h" #include "arrow/api.h" #include "arrow/array/builder_primitive.h" #include "arrow/builder.h" #include "arrow/dataset/api.h" #include "arrow/dataset/parquet_encryption_config.h" #include "arrow/dataset/partition.h" #include "arrow/filesystem/mockfs.h" #include "arrow/io/api.h" #include "arrow/status.h" #include "arrow/table.h" #include "arrow/testing/gtest_util.h" #include "parquet/arrow/reader.h" #include "parquet/encryption/test_in_memory_kms.h" ``` ########## cpp/src/parquet/CMakeLists.txt: ########## @@ -231,7 +231,8 @@ if(PARQUET_REQUIRE_ENCRYPTION) encryption/key_metadata.cc encryption/key_toolkit.cc encryption/key_toolkit_internal.cc - encryption/local_wrap_kms_client.cc) + encryption/local_wrap_kms_client.cc + encryption/test_in_memory_kms.cc) Review Comment: This source file is already included below. We shouldn't include test source files in the parquet library. ```suggestion encryption/local_wrap_kms_client.cc) ``` ########## cpp/src/arrow/dataset/file_parquet.cc: ########## Review Comment: It seems like the main goal of `PARQUET_REQUIRE_ENCRYPTION ` is to mask out parts that require additional dependencies, such as libssl or arrow json. So exposing the configuration types is okay. It looks like other parts of the encryption code base do this by swapping out the implementation (.cc) file for `encryption_internal.h`: https://github.com/apache/arrow/blob/c3acc91c3e21f225ee7cc3cb3f489f0153365f5a/cpp/src/parquet/CMakeLists.txt#L220-L237 ########## cpp/src/parquet/properties.h: ########## @@ -513,16 +535,16 @@ class PARQUET_EXPORT WriterProperties { return this; } - /// Disable decimal logical type with 1 <= precision <= 18 to be stored as - /// integer physical type. + /// Disable decimal logical type with 1 <= precision <= 18 to be stored + /// as integer physical type. /// /// Default disabled. Builder* disable_store_decimal_as_integer() { store_decimal_as_integer_ = false; return this; } - /// Enable writing page index in general for all columns. Default disabled. Review Comment: This should remain three slashes. ########## cpp/src/arrow/dataset/file_parquet.cc: ########## Review Comment: In this file, we can't assume that `PARQUET_REQUIRE_ENCRYPTION=ON`. So for example, we can't use `GetFileDecryptionProperties`, except within a `#ifdef PARQUET_REQUIRE_ENCRYPTION ... #endif` block. You should test this build with `PARQUET_REQUIRE_ENCRYPTION=OFF` *and* `ARROW_JSON=OFF` and make sure the unit tests for Parquet and Datasets are still able to pass. ########## cpp/src/parquet/properties.h: ########## @@ -582,8 +604,6 @@ class PARQUET_EXPORT WriterProperties { get(item.first).set_dictionary_enabled(item.second); for (const auto& item : statistics_enabled_) get(item.first).set_statistics_enabled(item.second); - for (const auto& item : page_index_enabled_) - get(item.first).set_page_index_enabled(item.second); Review Comment: This deletion is causing the test failure for the page index. Should add this back in. ########## cpp/src/parquet/CMakeLists.txt: ########## @@ -231,7 +231,8 @@ if(PARQUET_REQUIRE_ENCRYPTION) encryption/key_metadata.cc encryption/key_toolkit.cc encryption/key_toolkit_internal.cc - encryption/local_wrap_kms_client.cc) + encryption/local_wrap_kms_client.cc + encryption/test_in_memory_kms.cc) Review Comment: I think this is the cause of the ODR violation. If you want to add it do your test here: https://github.com/apache/arrow/blob/3fd881cac4476f1f7e6a531fe2864b1820de62ad/cpp/src/arrow/dataset/CMakeLists.txt#L177-L177 You can do so like we do here: https://github.com/apache/arrow/blob/c3acc91c3e21f225ee7cc3cb3f489f0153365f5a/cpp/src/arrow/CMakeLists.txt#L811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
