nevi-me commented on a change in pull request #8402:
URL: https://github.com/apache/arrow/pull/8402#discussion_r512238377



##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -175,15 +175,61 @@ fn write_leaves(
             }
             Ok(())
         }
+        ArrowDataType::Dictionary(k, v) => {
+            // Materialize the packed dictionary and let the writer repack it
+            let any_array = array.as_any();
+            let (k2, v2) = match &**k {
+                ArrowDataType::Int32 => {
+                    let typed_array = any_array
+                        .downcast_ref::<arrow_array::Int32DictionaryArray>()
+                        .expect("Unable to get dictionary array");
+
+                    (typed_array.keys(), typed_array.values())
+                }
+                o => unimplemented!("Unknown key type {:?}", o),
+            };
+
+            let k3 = k2;
+            let v3 = v2
+                .as_any()
+                .downcast_ref::<arrow_array::StringArray>()
+                .unwrap();
+
+            // TODO: This removes NULL values; what _should_ be done?
+            // FIXME: Don't use `as`
+            let materialized: Vec<_> = k3
+                .flatten()
+                .map(|k| v3.value(k as usize))
+                .map(ByteArray::from)
+                .collect();
+            //

Review comment:
       Only seeing this now.
   
   Yes, @carols10cents is correct. What makes Arrow > Parquet challenging is 
that we have to take an array like `<1, 2, null, 4, null, 6>` and convert it to 
`<1, 2, 4, 6>` then have definitions `<1, 1, 0, 1, 0, 1>`.
   It's super-trivial for primitives, but once you start nesting; it becomes 
difficult to even reason about what happens on some exotic combinations, 
especially with nested lists.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to