Re: [PR] Convert RunEndEncoded to Parquet [arrow-rs]

via GitHub Thu, 04 Sep 2025 09:32:10 -0700


albertlockett commented on code in PR #8069:
URL: https://github.com/apache/arrow-rs/pull/8069#discussion_r2322747346



##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -4293,4 +4304,50 @@ mod tests {
         assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024);
         assert_eq!(get_dict_page_size(col1_meta), 1024 * 1024 * 4);
     }
+
+    #[test]
+    fn arrow_writer_run_end_encoded() {
+        // Create a run array of strings
+        let mut builder = StringRunBuilder::<Int16Type>::new();
+        builder.extend(
+            vec![Some("alpha"); 1000]
+                .into_iter()
+                .chain(vec![Some("beta"); 1000]),
+        );
+        let run_array: RunArray<Int16Type> = builder.finish();
+        println!("run_array type: {:?}", run_array.data_type());
+        let schema = Arc::new(Schema::new(vec![Field::new(
+            "ree",
+            run_array.data_type().clone(),
+            run_array.is_nullable(),
+        )]));
+
+        // Write to parquet
+        let mut parquet_bytes: Vec<u8> = Vec::new();
+        let mut writer = ArrowWriter::try_new(&mut parquet_bytes, 
schema.clone(), None).unwrap();
+        let batch = RecordBatch::try_new(schema.clone(), 
vec![Arc::new(run_array)]).unwrap();
+        writer.write(&batch).unwrap();
+        writer.close().unwrap();
+
+        // Schema of output is plain, not dictionary or REE encoded!!

Review Comment:
   @alamb related to your comment here: 
https://github.com/apache/arrow-rs/issues/3520#issuecomment-3254515876
   
   In the discussion above, we were tentatively thinking that the shortest path 
to getting REE support would be to decode the column to the native arrow array 
using the existing readers, and then cast it to a `RunArray`. Then, maybe 
adding the dedicated REE reader as a followup. Does that sound like a workable 
approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Convert RunEndEncoded to Parquet [arrow-rs]

Reply via email to