[GitHub] [arrow-rs] alamb commented on a change in pull request #1156: Fuzz test different parquet encodings

GitBox Tue, 11 Jan 2022 10:10:23 -0800


alamb commented on a change in pull request #1156:
URL: https://github.com/apache/arrow-rs/pull/1156#discussion_r782408341




##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -511,10 +547,7 @@ mod tests {
                 num_row_groups,
                 num_rows,
                 record_batch_size,
-                null_percent: None,
-                max_data_page_size: 1024 * 1024,
-                max_dict_page_size: 1024 * 1024,
-                writer_version: WriterVersion::PARQUET_1_0,
+                ..Default::default()

Review comment:
       I double checked that these are the same values as in `impl Default for 
TestOptions` 👍 

##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -382,21 +405,28 @@ mod tests {
 
     #[test]
     fn test_utf8_single_column_reader_test() {
+        let encodings = &[
+            Encoding::PLAIN,
+            Encoding::RLE_DICTIONARY,
+            //Encoding::DELTA_LENGTH_BYTE_ARRAY,

Review comment:
       
https://github.com/tustvold/arrow-rs/blob/fuzz-different-encoding/parquet/src/arrow/arrow_array_reader.rs#L543
   
   Looks like that came in with the original array reader in  #384 from 
@yordan-pavlov 
   
   Do you think it would be a good exercise to support DELTA_LENGTH_BYTE_ARRAY 
in the reader? If so, I can file a ticket and see if anyone else is interested 
in picking it up

##########
File path: parquet/src/arrow/arrow_reader.rs
##########
@@ -358,7 +369,13 @@ mod tests {
             FixedSizeBinaryArray,
             FixedSizeArrayConverter,
             RandFixedLenGen,
-        >(20, ConvertedType::NONE, None, &converter);
+        >(
+            20,
+            ConvertedType::NONE,
+            None,
+            &converter,
+            &[Encoding::PLAIN, Encoding::RLE_DICTIONARY],

Review comment:
       Is the reason not to include `DELTA_BINARY_PACKED` here that that 
encoding is not supported for `FixedLengthByteArrays`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] alamb commented on a change in pull request #1156: Fuzz test different parquet encodings

Reply via email to