wgtmac commented on code in PR #35825:
URL: https://github.com/apache/arrow/pull/35825#discussion_r1235436093


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -661,16 +671,18 @@ class ParquetIOTestBase : public ::testing::Test {
   void RoundTripSingleColumn(
       const std::shared_ptr<Array>& values, const std::shared_ptr<Array>& 
expected,
       const std::shared_ptr<::parquet::ArrowWriterProperties>& 
arrow_properties,
+      const ArrowReaderProperties& arrow_reader_properties = 
default_arrow_reader_properties(),
       bool nullable = true) {
     std::shared_ptr<Table> table = MakeSimpleTable(values, nullable);
     this->ResetSink();
+

Review Comment:
   nit: remove this line.



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -439,7 +439,9 @@ void DoSimpleRoundtrip(const std::shared_ptr<Table>& table, 
bool use_threads,
                        int64_t row_group_size, const std::vector<int>& 
column_subset,
                        std::shared_ptr<Table>* out,
                        const std::shared_ptr<ArrowWriterProperties>& 
arrow_properties =

Review Comment:
   ```suggestion
                          const std::shared_ptr<ArrowWriterProperties>& 
arrow_writer_properties =
   ```



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -1342,6 +1360,31 @@ TEST_F(TestUInt32ParquetIO, Parquet_1_0_Compatibility) {
 
 using TestStringParquetIO = TestParquetIO<::arrow::StringType>;
 
+TEST_F(TestStringParquetIO, Basics) {

Review Comment:
   Could we use some meaningful name to replace `Basics` here?



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -661,16 +671,18 @@ class ParquetIOTestBase : public ::testing::Test {
   void RoundTripSingleColumn(
       const std::shared_ptr<Array>& values, const std::shared_ptr<Array>& 
expected,
       const std::shared_ptr<::parquet::ArrowWriterProperties>& 
arrow_properties,

Review Comment:
   ```suggestion
         const std::shared_ptr<::parquet::ArrowWriterProperties>& 
arrow_writer_properties,
   ```
   
   Would be good to explicitly add `writer` to the parameter.



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -1388,6 +1431,14 @@ TEST_F(TestLargeBinaryParquetIO, Basics) {
   const auto arrow_properties =
       ::parquet::ArrowWriterProperties::Builder().store_schema()->build();
   this->RoundTripSingleColumn(large_array, large_array, arrow_properties);
+
+  ArrowReaderProperties arrow_reader_properties;
+  arrow_reader_properties.set_use_large_binary_variants(true);
+  // Input is narrow array, but expected output is large array, opposite of 
the above tests.
+  // This validates narrow arrays can be read as large arrays.
+  this->RoundTripSingleColumn(narrow_array, large_array,
+                              default_arrow_writer_properties(),

Review Comment:
   If `:parquet::ArrowWriterProperties::Builder().store_schema()` is set, what 
do we return in the current impl? large_array right?



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3862,6 +3919,17 @@ TEST(TestArrowReaderAdHoc, CorruptedSchema) {
   TryReadDataFile(path, ::arrow::StatusCode::IOError);
 }
 
+TEST(TestArrowParquet, LargeByteArray) {
+  auto path = test::get_data_file("chunked_string_map.parquet");
+  TryReadDataFile(path, ::arrow::StatusCode::NotImplemented);
+  ArrowReaderProperties reader_properties;
+  reader_properties.set_use_large_binary_variants(true);
+  reader_properties.set_read_dictionary(0, false);

Review Comment:
   Now `chunked_string_map.parquet` may either be dictionary-encoded or 
plain-encoded, so we may miss some cases. Would be good to modify the test file 
to contain two columns, with one column dictionary-encoded and the other 
plain-encoded. The caveat is that the test file may grow larger. WDYT?



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -1342,6 +1360,31 @@ TEST_F(TestUInt32ParquetIO, Parquet_1_0_Compatibility) {
 
 using TestStringParquetIO = TestParquetIO<::arrow::StringType>;
 
+TEST_F(TestStringParquetIO, Basics) {

Review Comment:
   Thanks for adding this test!



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -1342,6 +1360,31 @@ TEST_F(TestUInt32ParquetIO, Parquet_1_0_Compatibility) {
 
 using TestStringParquetIO = TestParquetIO<::arrow::StringType>;
 
+TEST_F(TestStringParquetIO, Basics) {
+  std::shared_ptr<Array> values;
+
+  ::arrow::StringBuilder builder;
+  for (size_t i = 0; i < SMALL_SIZE; i++) {
+    ASSERT_OK(builder.Append("abc"));
+  }
+  ASSERT_OK(builder.Finish(&values));
+
+  // Input is narrow array, but expected output is large array, opposite of 
the above tests.

Review Comment:
   The comment seems to be copied from line 1437? `above tests` looks confusing 
here.



##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -1342,6 +1360,31 @@ TEST_F(TestUInt32ParquetIO, Parquet_1_0_Compatibility) {
 
 using TestStringParquetIO = TestParquetIO<::arrow::StringType>;
 
+TEST_F(TestStringParquetIO, Basics) {
+  std::shared_ptr<Array> values;
+
+  ::arrow::StringBuilder builder;
+  for (size_t i = 0; i < SMALL_SIZE; i++) {
+    ASSERT_OK(builder.Append("abc"));
+  }
+  ASSERT_OK(builder.Finish(&values));
+
+  // Input is narrow array, but expected output is large array, opposite of 
the above tests.
+  // This validates narrow arrays can be read as large arrays.

Review Comment:
   This line seems incorrect, it writes `values` and reads `values` in line 
1374.



##########
cpp/src/parquet/arrow/reader.cc:
##########
@@ -1371,9 +1374,15 @@ Result<std::unique_ptr<FileReader>> 
FileReaderBuilder::Build() {
 
 Status OpenFile(std::shared_ptr<::arrow::io::RandomAccessFile> file, 
MemoryPool* pool,
                 std::unique_ptr<FileReader>* reader) {
+  return OpenFile(std::move(file), pool, default_arrow_reader_properties(), 
reader);
+}
+
+Status OpenFile(std::shared_ptr<::arrow::io::RandomAccessFile> file, 
MemoryPool* pool,

Review Comment:
   Would you mind not adding a new public function and use `FileReaderBuilder` 
instead in the test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to