albertlockett commented on code in PR #8069:
URL: https://github.com/apache/arrow-rs/pull/8069#discussion_r2305028875


##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -4293,4 +4304,50 @@ mod tests {
         assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024);
         assert_eq!(get_dict_page_size(col1_meta), 1024 * 1024 * 4);
     }
+
+    #[test]
+    fn arrow_writer_run_end_encoded() {
+        // Create a run array of strings
+        let mut builder = StringRunBuilder::<Int16Type>::new();
+        builder.extend(
+            vec![Some("alpha"); 1000]
+                .into_iter()
+                .chain(vec![Some("beta"); 1000]),
+        );
+        let run_array: RunArray<Int16Type> = builder.finish();
+        println!("run_array type: {:?}", run_array.data_type());
+        let schema = Arc::new(Schema::new(vec![Field::new(
+            "ree",
+            run_array.data_type().clone(),
+            run_array.is_nullable(),
+        )]));
+
+        // Write to parquet
+        let mut parquet_bytes: Vec<u8> = Vec::new();
+        let mut writer = ArrowWriter::try_new(&mut parquet_bytes, 
schema.clone(), None).unwrap();
+        let batch = RecordBatch::try_new(schema.clone(), 
vec![Arc::new(run_array)]).unwrap();
+        writer.write(&batch).unwrap();
+        writer.close().unwrap();
+
+        // Schema of output is plain, not dictionary or REE encoded!!

Review Comment:
   I have a feeling this could be because we're not handling REE when handling 
the arrow type hint here:
   
https://github.com/apache/arrow-rs/blob/09317688974ee757f0ca18d80bcec12cf32f76d2/parquet/src/arrow/schema/primitive.rs#L40
   
   Because there's not a direct 1:1 mapping for arrow type to parquet type, the 
parquet writer will write the encoded arrow schema to the parquet file 
metadata, and will use this to convert the parquet types back to the arrow 
types when it reads. 
   
   We might have to add something like in that `schema::primitive::apply_hint` 
functino:
   ```rs
           (_, DataType::RunEndEncoded(_, value)) => {
               let hinted = apply_hint(parquet, value.data_type().clone());
               match &hinted == value.data_type() {
                   true => hint,
                   false => hinted,
               }
           },
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to