arthurpassos commented on code in PR #35825:
URL: https://github.com/apache/arrow/pull/35825#discussion_r1237236518


##########
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##########
@@ -3862,6 +3919,17 @@ TEST(TestArrowReaderAdHoc, CorruptedSchema) {
   TryReadDataFile(path, ::arrow::StatusCode::IOError);
 }
 
+TEST(TestArrowParquet, LargeByteArray) {
+  auto path = test::get_data_file("chunked_string_map.parquet");
+  TryReadDataFile(path, ::arrow::StatusCode::NotImplemented);
+  ArrowReaderProperties reader_properties;
+  reader_properties.set_use_large_binary_variants(true);
+  reader_properties.set_read_dictionary(0, false);

Review Comment:
   Does not look like encoding was added:
   
   ```
   arthur@arthur:~/parquet-testing$ 
~/arrow/cpp/cmake-build-ninja-debug/debug/parquet-reader 
~/parquet-testing/test.parquet --only-metadata --print-key-value-metadata
   File Name: /home/arthur/parquet-testing/test.parquet
   Version: 2.6
   Created By: parquet-cpp-arrow version 11.0.0
   Total rows: 2
   Key Value File Metadata: 1 entries
    Key nr 0 ARROW:schema: 
/////8gBAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAIAAAC8AAAABAAAAPT+//8AAAERFAAAACAAAAAEAAAAAQAAABgAAAAIAAAAZGljdF9hcnIAAAAAqP7//9z+//8AAAANGAAAACAAAAAEAAAAAgAAAEgAAAAUAAAABwAAAGVudHJpZXMA2P7//1T///8AAAECEAAAABgAAAAEAAAAAAAAAAUAAAB2YWx1ZQAAAET///8AAAABIAAAADz///8AAAAFEAAAABQAAAAEAAAAAAAAAAMAAABrZXkALP///6j///8AAAERFAAAABgAAAAEAAAAAQAAABAAAAADAAAAYXJyAFT///+I////AAAADRgAAAAgAAAABAAAAAIAAABwAAAAJAAAAAcAAABlbnRyaWVzAIT///8QABQACAAGAAcADAAAABAAEAAAAAAAAQIQAAAAIAAAAAQAAAAAAAAABQAAAHZhbHVlAAAACAAMAAgABwAIAAAAAAAAASAAAAAQABQACAAAAAcADAAAABAAEAAAAAAAAAUQAAAAGAAAAAQAAAAAAAAAAwAAAGtleQAEAAQABAAAAAAAAAA=
   Number of RowGroups: 1
   Number of Real Columns: 2
   Number of Columns: 4
   Number of Selected Columns: 4
   Column 0: arr.key_value.key (BYTE_ARRAY / String / UTF8)
   Column 1: arr.key_value.value (INT32)
   Column 2: dict_arr.key_value.key (BYTE_ARRAY / String / UTF8)
   Column 3: dict_arr.key_value.value (INT32)
   --- Row Group: 0 ---
   --- Total Bytes: 4294967594 ---
   --- Total Compressed Bytes: 6774 ---
   --- Rows: 2 ---
   Column 0
     Values: 2, Null Values: 0, Distinct Values: 0
     Max: , Min: 
     Compression: BROTLI, Encodings: PLAIN
     Uncompressed Size: 2147483732, Compressed Size: 3326
   Column 1
     Values: 2, Null Values: 0, Distinct Values: 0
     Max: 1, Min: 1
     Compression: BROTLI, Encodings: PLAIN
     Uncompressed Size: 65, Compressed Size: 61
   Column 2
     Values: 2, Null Values: 0, Distinct Values: 0
     Max: , Min: 
     Compression: BROTLI, Encodings: PLAIN
     Uncompressed Size: 2147483732, Compressed Size: 3326
   Column 3
     Values: 2, Null Values: 0, Distinct Values: 0
     Max: 1, Min: 1
     Compression: BROTLI, Encodings: PLAIN
     Uncompressed Size: 65, Compressed Size: 61
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to